Conceptual Model of Process Formation for the
          Semantics of Sentence in Natural Language

Oleg Bisikalo[0000-0002-7607-1943]1, Victoria Vysotska[0000-0001-6417-3689]2, Yevhen Burov[0000-
                       0001-8653-1520]3
                                       , Petro Kravets[0000-0001-8569-423X]4
                 1
                  Vinnytsia National Technical University, Vinnytsia, Ukraine
                    2-4
                        Lviv Polytechnic National University, Lviv, Ukraine

          obisikalo@gmail.com1, Victoria.A.Vysotska@lpnu.ua2,
          Yevhen.V.Burov@lpnu.ua3, petro.o.kravets@lpnu.ua4


        Abstract. The article explains the use of generative grammars in linguistic
        modeling. Describes syntax modeling of sentence that used to automate the
        analysis and synthesis of natural-language texts. The article proposes content
        analysis methods for an online-newspaper.

        Keywords. Content, Generative Grammars, Sentence Structure, NLP, Infor-
        mation Resource, Content Analysis, Computer Linguistic System, Content
        Management System


1       Introduction

Generative grammar theory is an effective tool for syntactic level linguistic modeling.
The beginning of this theory is laid in the works of linguist N. Chomsky, where for-
mal analysis of the grammatical structure of phrases is used to distinguish the syntac-
tic structure (constituents) as the basic scheme of a phrase, regardless of its meaning
[38-41, 49-57]. A. Gladkiy applied the concepts of dependency trees and component
systems to simulate syntactic language level [14-16]. He suggested a way to moderate
syntax using syntactic groups that distinguish word components as units of construct-
ing a dependency - a representation that combines the benefits of the method of direct
constituents and dependency trees.


2       Analysis of Recent Research and Publications

Active development of the Internet contributes to the creation of various linguistic
resources. The need for the implementation of the processes of analysis and synthesis
of natural-language texts led to the emergence of appropriate linguistic models of
processes of their processing [2-5, 8-9, 12-19, 21-24, 26-29, 32-33, 37-41, 43, 46, 48-
59, 61-63, 68]. The need of arises in the development of many linguistic disciplines
for the needs of information sciences. Integration processes in most areas of the mod-
    Copyright © 2020 for this paper by its authors.
    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
ern world attract particular attention to the development and creation of automated
multilingual information processing systems. Formal generative grammar G is a quad-
ruple G = (V, T, S, P), where V is a finite non-empty set, the alphabet (dictionary); T
its subset whose elements are terminal (main) symbols, terminals; S is the initial sym-
bol (S∈V); P is a finite set of productions (transformation rules) of the form ξ→η,
where ξ and η, are chains above V. The set is denoted by N, its elements are non-
terminal (auxiliary) symbols, not terminals [14-16]. We will interpret terminal sym-
bols as word forms (of some natural language), non-terminal symbols as syntactic
categories, and terminal derived strings as correct sentences of this language [12-16,
38-41, 49-57]. Then the derivation of a sentence is naturally interpreted as its syntac-
tic structure, which is presented in terms of direct constituents, that is, by a method
long known in linguistics [10, 44, 46-47]. Research and research by N. Chomsky and
A. Gladkiy developed and continued by A. Anisimov [2-3], Y. Apresyan [4-5], N.
Bilgaeva [8], E. Bolshakova, E. Klishinsky, D. Lande, A. Noskov, O. Peskov and E.
Yagunova [9], I. Volkova and T. Rudenko [11], O. Gakman [12], A. Gerasimov [13],
M. Gross and A. Lanten [17], N. Darchuk [18], I. Demeshko [19], V. Inve [21, 65-
66], T. Lyubchenko [22], B. Martynenko [23], O. Marchenko [24], E. Paducheva
[26], Z. Partiko [27] , A. Pentus and M. Pentus [28], E. Popov [29], G. Potapova [32],
N. Rusachenko [33], V. Fomichev [37], S. Sharov [43], Y. Shcherbina [46], Y.
Schrader [48], Y. Bar-Hillel and E. Shamir [58], D. Bobrov [59], D. Hays [61], P.
Postal [62], L. Tecniere [63], D. Varga [68]. These studies are used to develop tools
for NLP, such as text annotation, machine translation systems, information retrieval
systems, morphological, educational didactic systems, syntactic and semantic text
analysis, linguistic support for specialized software systems, etc. [18-27, 43, 67-89].


3      Formation of the Purpose

Within the article, we will show how to use the generative grammar apparatus to sim-
ulate sentence syntax for different natural languages, such as English, German, Rus-
sian, and Ukrainian. To do this, let's analyze the syntactic structure of sentences,
demonstrate the features of the process of synthesis of sentences of these languages.
Consider the influence of the rules and rules of language on the course of constructing
grammars [10, 44, 46-47].


4      Analysis of Achieved Scientific Results

Generative grammar G - is a quadruple G = (V, T, S, P), where V - is a finite non-
empty set, the alphabet (dictionary); T - its subset whose elements are terminal
(basic) lexical units, terminals; S - is the initial symbol (S∈V); P - is a finite set of
productions (transformation rules) of the form ξ→η, where ξ and η, are chains above
V. The set is denoted by N, its elements are non-terminal (auxiliary) lexical units, not
terminals [14-16]. Grammars are classified by types of products subject to certain
restrictions (Table 1) [14-16, 38-41, 49-57]. V vocabulary consists of a finite non-
empty set of lexical units [60]. The expression over V is a finite-length chain of lexi-
cal units of V. An empty chain that does not contain lexical units is denoted by. De-
note the set of all lexical units over V is V*. Language over V is a subset V*. The lan-
guage is given through the set of all lexical units of the language or through the defi-
nition of the criterion that lexical units must satisfy in order to belong to the language
[14-16, 38-41, 49-57]. Another important way to define a language is through the use
of generative grammar. Grammar consists of a set of lexical units of different types
and a set of rules or products of expression construction. Grammar has a V dictionary,
which is the set of lexical items for constructing language expressions. Some vocabu-
lary (terminal) vocabulary units cannot be replaced by other vocabulary units.

                     Table 1. Classification of grammars by product type
Grammar Type              Description
G0                        Here  – is an arbitrary chain containing at least one non-terminal
        Unbounded
                          sysmbol,  – is an arbitrary chain over V.
G1                        In many products P is a product of the form ,   
          Context-
                          (but not in the form of ), then  can be replaced V only in the
          dependent
                          environment of chains , i.e. in the relevant context.
G2                        The non-terminal A on the left-hand side of the product A→ can be
          Context-
                          replaced by a chain  in an arbitrary environment whenever it
          free
                          occurs, i.e. regardless of context.
G3                        Only products AaB, Aa, S, may be regular, where A, B –
          Regular
                          non-terminals, a – terminal,  – is an empty chain.

A distinctive feature of context-free grammars G2 - is that at each step of the output is
processed exactly one character, that is, in no way can be taken into account the
presence / absence or properties of different adjacent characters. This may give the
impression that grammars G2 are not suitable for describing natural languages: in
ordinary grammars, statements about the choice of forms or variations, or variations,
of these or other elements of the expression are usually formulated with contextual
conditions in mind. Thus, when describing inflectional forms, they indicate flexion,
which must be selected depending on the type of substrate (acting as context); in the
description of the use of Ukrainian distinctions indicate that the initial case of the
direct supplement is replaced by a generic one in the presence of objection, etc.; oud
subject is possible with the verb noun only if the noun is a supplement in the genitive
case (content viewing by the user, but not *viewing by the user), etc. [7, 10, 18-20, 32-
33, 36, 45]. Grammar G3 is practically capable of generating the vast majority of
simple and complex natural language sentences. Therefore, this statement is also true
for arbitrary grammars. In almost all cases where the use of context is at first sight
inevitable, in fact it can be dispensed with. For example, let there be a class of
elements, and in the neighborhood of the elements of a class Y the elements X behave
differently than in the neighborhood of the elements of a class Z by the following
rules:

                    YXYAB and ZXZCD (these rules use context).
We denote by X the element X in the position after Y , and by X - the element
                     1                                                              2
 X in the position after Z . Then you can go to rules that do not use context:
                                     X1  AB and X 2  CD .

That is, more elemental categories are introduced that take into account their position
in context. Let us show how the transition to grammar G2 formulations can be done
in the context-based examples above. On the left is the required snippet of the
corresponding context-dependent grammar, on the right is an equivalent snippet
consisting of context-free rules.
1. Choice of flexion of the case depending on the type of base [7, 10, 18-20, 32-33,
   36, 45]:

      Word fl . gn  O i F fl . gn                          Word fl . gn  O i F fli . gn

      O1F fl . gn  O i ds frien  ds                      F fl1 . gn  ds

      O 2 F fl . gn  O i ysto  ys                        F fl2. gn  ys

      O 3 F fl . gn  O i enschildr  ens                  F fl3. gn  ens

      O 4 F fl . gn  O i vesrelati  ves                  F fl4. gn  ves

      O 5 F fl. gn  O i cars                             F fl5.gn  
      ……………………………………………………………………………….

where Word is wordform, О i is basis of type i i  1,2,3,... , F fl . gn is flexion of the
genus. the case of pluralism.
2. The choice of a direct complement, depending on the objection:
       ~      ~                            ~       ~
       R  Ri N d                          R  R i N d1
       ~        ~                          ~          ~
       R  R i N d                        R  R i N d2
         ~          ~                      ~      ~
       XRN d  XR i N з                    N d1  N з (student studies discipline)
                     X x
          ~           ~                    ~      ~
       XRN d  XR i N p                  N d2  N p (student does not study discipline)
                          X x


where R~ is group of verbs, i is transitive verb, ~ is direct complement, ~ is
                              R                      Nd                   N
group of noun,  is negation [7, 10, 18-20, 32-33, 36, 45].
3. Opportunity of a subject in case of a verbal noun, depending on the presence of the
   object (user viewing content):
        ~   ~ ~ ~                            ~     ~ ~ ~ 1                               ~   ~ ~
        N  N  N ob N sb                    N  N  N ob N sb                           N  N  N sb
        ~   ~ ~ 2                            ~ ~          ~ ~                            ~ 1    ~
        N  N  N sb                         N ob N sb  N ob N o                        N sb  N o

      ~ ob is the object, ~ sb is the subject [7, 10, 18-20, 32-33, 36, 45]. In these
where N                   N
examples, exactly the same formal technique is used: context information is encoded
in new categories. So the fewer contexts you use, the more categories you need to
enter, and vice versa. The appeal of switching to context-free rules is that it is difficult
to evaluate the degree of complexity of different and meaningful contextual
references, whereas in grammar rules, measuring the degree of complexity is the
number of categories used.
   The context cannot be abandoned, that is, it is impossible to dispense with one
character to the left of the rule if the rule is to allow a character shift. Consequently,
grammar G2 cannot produce language that contains strings that cannot be constructed
without permutations. Consider, for example, a language that contains all sorts of
chains, a1a2 a3qa1a2 a3 , a2a1a2a3qa2 a1a2 a3 , a1a3a2a1qa1a3a2 a1 , etc. (in general, such chains
can be written as xqx’) and that does not contain any other chains. Substantive a1 and
a’1, a2 and a’2 the like can be understood as pairs of elements that are in some way
consistent with each other. This is not about the characters a1, a’1 themselves, etc., but
about their corresponding occurrences in the strings. This language is easily generated
by grammar containing permutation rules. For this grammar, there is an equivalent
grammar G1 containing permutation rules and can be represented as follows:

                                       1. I  IAi ai, 
                                       2. aiAj  Aj ai, 
                                                            i, j  1, 2,3
                                       3. IAi  ai I , 
                                       4. I  q.           

(ai, a’i, q are main characters; Ai is auxiliary characters; I is initial character). Let's
                                                                               
show for example how you can draw a chain in this grammar a2a1a1a3qa2a1a1a3 :

1. I
2. (1) IA3a3
3. (1) IA1a1 A3a3
4. (1) IA1a1 A1a1 A3a3
5. (1) IA2a2 Aa    1 1A3a3
                1 1 Aa

6. (2) IA2 Aa     1 1A3a3
             1 2a1 Aa

7. – 11. .......... (2; 5 times)
12. (3) a2 IA1 A1 A3a2a1a1a3
13. – 15. .......... (3; 3 times)
16. (4) a2a1a1a3qa2 a1a1a3
Language {xqx’} cannot be generated by any grammar G2 . This phenomenon occurs
in natural languages, that is, they are possible fragments consisting of chains of ap-
pearance xqx’. Two examples of this kind are described in the literature:
                                  а        b        с         d                 а
  1) Constructions of type: Саша , Софія , Катя , Данило , … – спортсмен ,
    b           с         d
 співачка , художниця , поет , … in accordance [7, 10, 18-20, 32-33, 36, 45]. Here
the role x (abcd…) is played by the chain of proper names, and the role x’
(a’b’c’d’…) is played by the chain of professions, which must be reconciled with
these names in the genus [58]; q is a dash (more precisely, the knot is null).
   2) According to [62], the Indian language is widespread sentences, in which the
main complement is duplicated by the incorporation of the relevant bases into the
verb-adverb: The artist paints and paints a landscape. In addition, any verb (includ-
ing incorporating additions) is easily substantiated and acquires the ability to act as a
supplement: in particular, моя дитина вподобала книгочитання (i.e., "my child
liked reading books") [7, 10 , 18-20, 32-33, 36, 45]. This process can theoretically be
repeated for an unlimited number of times: він книгочитанняцікаводумає про
книгочитанняцікавість ("he thinks of an interesting reading books"), i.e.

              a  b    c             a   b     c  
         Він книгочитанняцікавість думає про  книгочитанняцікавість .

Here x’ (=a’b’c’d’) is a supplement, x (=abcd) its duplicate incorporated into a prepo-
sition, and a q is preposition. Such a construction is only correct when the incorpo-
rated duplicate of the supplement exactly matches the complement in the composition
and the order of compliance of the basics.
    Taking these examples into account, grammars G2 are not sufficient to describe all
natural languages in their entirety. But both examples are peripheral: the first design,
although permissible, probably in any language, is very specific and not related to the
consumed, and the second, one that is very general and, apparently, sufficiently con-
sumed, known only in one less common language. Therefore, with all the theoretical
value of these examples, they can be neglected. If you turn away from them, then
grammars G2 can be considered in principle sufficient means for describing natural
languages. This statement, of course, can be rigorously proven; belief in its truth is
based on a number of empirical considerations.
1. There are so-called categorical grammars related to recognizable grammars. These
   grammars were developed and applied to natural languages regardless of grammars
    G2 , with no examples of their inadequacy (except for the two mentioned above) so
   far. However, it has been proved by A. Gladkiy [14-16] that the class of languages
   described by categorical grammars coincides with the class of context-free
   languages.
2. More recently, slot machines with the ability to carry both recognition and
   generation have been offered to describe languages. Chomsky proved that all the
   languages processed by such automatons are context-free, and vice versa [38-41,
   49-57]. Thus, another formal model of natural language, introduced for
   independent reasons and without significant fundamental difficulties, is equivalent
   to grammar G2 .
3. Within mathematical linguistics, a class of so-called finite-characterized languages
   that are intuitively close to natural languages is easily distinguished. All finite-
   character languages are context-free (reverse is wrong!). This again suggests that
   grammars G2 are capable of producing natural languages.
4. Finally, there are a number of algorithms for automatic analysis and generation of
   texts in natural languages that are used as descriptions of the corresponding
   grammar G2 languages or equivalent systems. The grammars are based, for
   example, on syntax analysis algorithms for several languages being developed at
   the University of Texas [64], a number of algorithms using the so-called Coco
   method [61], and some other algorithms mentioned in [6, 59, 66].
All this compels grammar to be sufficient for natural languages. In particular, it is
worth noting that constructions of the type abcd…d’c’b’a’ that are not described by
grammars can easily be generated by grammar G3 . Yes, it's easy to show that a lan-
guage that is exactly a strand of a given type (composed of a1, a2, a3, a’1, a’2, a’3) is
generated by a grammar G2 containing only six rules:

                                     I  ai Iai 
                                                  i  1, 2,3.
                                     I  ai a  

Now, the following two important points need to be made.
   Firstly, what is said does not mean that grammars G2 produce only natural lan-
guages and / or languages close to them: among context-free languages there are those
that are not at all similar in structure to natural ones. Secondly, the fact that grammars
 G2 are practically sufficient to describe natural languages does not imply that they
are always convenient for this purpose, that is, they allow one to describe any natural
language construct in a natural way. Grammar G2 does not, for example, provide a
natural (artificial intelligence) description for so-called non-design structures, that is,
for structures with discontinuous components (or with intersections

   . . . .          ,     . . . .          etc.,    or     framed   by        . . . .          ,

   . . . .         syntactic arrows) [14-16]. In this case, non-design structures are
available in a variety of languages:


Ukr. Наша мова, як і будь-яка інша, посідає унікальне місце. [7, 10, 18-20, 32-33, 36, 45].
Rus.                                                                          [6, 9, 14-16].

       К этой поездке может пробудить интерес только выступление директора.
          A theorem is stated which describes the properties of this function.
Eng.                                                                                  [1, 34, 38-41, 49-60].
Ger.


... die Tatsache, daß die Menschen die Fähigkeit besitzen, Verhältnisse der objektiven Realität in Aussagen wiederzuspiegeln.
[25, 30-31, 35, 42].


Fr.    ... la guerre, dont la France portait encore les blessures...              [14-16].


Serb-chor.       Regulacija procesa jedan je od najstarjih oblika regulacije.          [14-16].


Hung. Azt hisszem, hogy késedelmemmel sikerült bebizonyítani.[14-16].

To describe the structure of similar phrases in terms of components (and grammars
 G2 describe the syntactic structure exactly), then for natural description it is neces-
sary to use discontinuous components: all words dependent on the same word must
form (with it) one component, and this is in the absence of projectivity will necessari-
ly lead to the appearance of discontinuous components (before this trip ... interest,
and theorem ... which describes the properties of this function, etc.). However, the
systems of grammar G2 components ascribed to grammar phrases and, moreover, to
any grammar of immediate components, discontinuous components can not contain.
   Consider two special cases of grammar G2 equivalent to grammar G3 .
   The first case. In natural languages it is possible to place dependent words to the
right of the main (right subordination) [14-16]:


                        назва курсу, лист бумаги,             une regle stricte,        give him,

or to the left of the main (left subordination) [14-16]:


                       основний курс, белый лист,                 cette regle,      good advice.

Both right and left subordination can be sequential [7, 10, 18-20, 32-33, 36, 45]:
     витяг з протоколу звiтування з наукової дiяльностi заступника завiдувача кафедри


           IСМ iнституту IКНI Нацiонального унiверситету "Львiвська полiтехнiка"


                               мiста Львова країни Українa


     жена сына заместителя председателя второй секции эклектики совета по


or    прикладной мистике при президиуме Академии наук королевства Myрак


and досить повiльно рухлива черепаха


or    очень быстро бегущий олень.

Depending on the language, one or another consecutive right or left subordinated
construct may be theoretically unlimited: for example, a consecutive subordinated
construct in the Ukrainian language (unlimited right subordination) and a similar con-
struction in Lithuanian (where N p always preceded by a word which leads to unlim-
ited left subordination). The fact that the languages of the world are different and can
be classified by the predominance of right or left subordination and, in particular,
depending on the possibility of unlimited consistent subordination in one direction or
another, has been noted and investigated in [63]. Also, this problem is in connection
with the use of grammars G2 for the description of natural languages was drawn by
V. Ingve [21, 65]. He noted that there are a large number of languages (e.g. English,
Ukrainian, French, etc.) in which consistent right subordination is in principle not
restricted, and in left subordination the length of the chain is always limited due to the
structural features of these languages. B. Ingve's hypothesis [21], which attempts to
explain this empirical observation by some general patterns of the structure of the
human psyche.
   It turns out that the grammar G2 that generates such a language has the following
interesting property: for any terminal chain that is output, there is such a conclusion at
each line that all the auxiliary symbols are collected at the right end, occupying no
more than K the last places ( K is constant, fixed for the given grammar, that is, the
same for all conclusions in it). In order for grammar G2 to have the specified proper-
ty, there is not enough bounding of consecutive left subordination. It is necessary to
fulfill a number of stronger and difficult formulated requirements [26], which imply,
for example, the limitation of the right parallel subordination .    . . . . ... and the
consistent embedding of the type .    . . ... . . [48].
    If each line of output is divided into two parts: left - one terminal character to the
first auxiliary character X - and right of X - inclusive to the end (the right part can
also contain terminal characters), then the right part will always contain no more than
 K characters. The left part of the content is interpreted as an already "issued" piece
of the generated chain (in the next steps of the output, this piece is no longer amena-
ble to any processing), and the right - as a working area, which the grammar should
keep in memory. Thus, the number K is nothing but the maximum amount of
memory required to generate any chain in the given grammar (i.e. there will be a
chain that is not generated by the amount of memory  K ). This number coincides
with the maximum chain length of consecutive left subordinates possible in a given
language: yes, if there is no more than three consecutive subordinates left in a given
language, then, when generating this language, it is possible to construct such an out-
put in which no need arises remember more than three characters at a time. A marked
relationship between the allowable depth of the left subordination and the amount of
memory was established by V. Ingve [21, 65-66]. Let us illustrate the example, name-
ly, consider the grammar G2 that gives rise to some nominal groups of the Ukrainian
language [7, 10, 18-20, 32-33, 36, 45], in which the right subordination is not limited
and the depth of the left does not exceed two.
    Example 1 for scheme of grammar G2
        ~                       ~           ~            ~          ~
        N x , y ,z  N x , y ,z N x, y, p N x, y ,z  Ax, y ,z N x, y ,z

        Ax, y , z  very, enough, exact, easy, important...,.Ax, y , z
        ~

        ~                                   ~                              N ж , y , z  system y , z ,...
        N x, y ,z  N x, y ,z               Ax , y , z  Ax , y , z
       N ч, y , z  request y , z , usery , z , resource y , z , business y , z ,...
        Ах , y , z  in formational х , y , z , simpleх , y , z ,...
(The peculiarities of the agreement A with the animated N x , y ,н are not taken into
account.) Here is an example of the conclusion in grammar G2 [7, 10, 18-20, 32-33,
36, 45]:
    ~
    N m,single
    ~         ~
    Am,singleN m,single
                      ~
   simple Am,singleN m,single
                              ~
   simple enough Am,singleN m,single
                                  ~       ~
   simple enough informational N m,singleN m,single
                                          ~
   simple enough informational N m,singleN m,single
                                         ~
   simple enough informational request N m,single
                                         ~          ~
   simple enough informational request N m,singleN m,single
                                                    ~
   simple enough informational request N m,singleN m,single
                                                 ~
   simple enough informational request (of) user N m,single
                                                  ~         ~
   simple enough informational request (of) user N m,singleN f ,single
                                                            ~
   simple enough informational request (of) user N m,singleN f ,single
                                                                   ~
   simple enough informational request (of) user (of) resource N f ,single
                                                                   ~        ~
   simple enough informational request (of) user (of) resource N f ,singleN m,single
                                                                            ~
   simple enough informational request (of) user (of) resource N f ,singleN m,single
                                                                           ~
   simple enough informational request (of) user (of) resource system N m,single
                                                                           ~
   simple enough informational request (of) user (of) resource system N m,single
   simple enough informational request (of) user (of) resource system (of) business

Example 2 for scheme of grammar G2
   ~                       ~            ~            ~          ~
   N x , y ,z  N x , y ,z N x, y, p  N x, y ,z  Ax, y ,z N x, y ,z
   Ax, y , z  really, enough, exact, easy, important...Ax, y , z
   ~
   ~                                    ~
   N x, y ,z  N x, y ,z                Ax , y , z  Ax , y , z
    N ж , y , z  school y , z ,...        N ч, y , z  laugh y , z , pupily , z , Lvivy , z ,...
    N с, y , z  city y , z ,...           Ах , y , z  joyful х , y , z , childishх , y , z ,...
(The peculiarities of the agreement A with the animated N x , y ,н are not taken into
account.) Here is an example of the conclusion in grammar G2 [7, 10, 18-20, 32-33,
36, 45]:
    ~
    N m,sin gle
    ~           ~
    Am,singleN
                 ~
   very Am,singleN m,single
                         ~
   very joyful Am,singleN m,single
                           ~        ~
   very joyful children N m,singleN m,single
                                     ~
   very joyful children N m,singleN m,single
                                  ~
   very joyful children laugh N m,single
                                   ~       ~
   very joyful children laugh N m,singleN m,single
                                           ~
   very joyful children laugh N m,singleN f ,single
                                             ~
   very joyful children laugh (of) pupil N f ,single
                                             ~       ~
   very joyful children laugh (of) pupil N f ,singleN f ,single
                                                   ~
  very joyful children laugh (of) pupil N f ,singleN n,single
                                                      ~
  very joyful children laugh (of) pupil (of) school N n,single
                                                        ~      ~
  very joyful children laugh (of) pupil (of) school N n,singleN n,single
                                                               ~
  very joyful children laugh (of) pupil (of) school N n,singleN n,single
                                                               ~
  very joyful children laugh (of) pupil (of) school (of) city N m,single
  very joyful children laugh (of) pupil (of) school (of) city N m,single
  very joyful children laugh (of) pupil (of) school (of) city Lviv

In this output, the amount of storage is two: no intermediate chain contains more than
two auxiliary characters. The same chain could be generated in a different way by
                                                            ~
using more memory, for example, first retrieving from the N m,single chain

        very Am,single Am,single N m,single N m,single N f ,single N n,single N m,single

and from it our terminal chain. For us, however, the amount of memory required is
important, which means that it is not possible to get this chain with less volume. It is
this volume that is equal to two here.
   You can prove that any terminal chain that is displayed in can be generated G2
with the amount of memory 2. The proof is based on a very simple reasoning: the
"good" conclusion should be drawn so that for each noun first its terminal dependents
were issued in terminal form, and only then the name group was deployed to the right.
   Theorem 1. Grammar G2 of the type described (with limited memory) is always
equivalent to some grammar G3 [14-16]. This is not easy to prove (the proof is that
the right side of the K character string is encoded with one new auxiliary character).
Thus, in the case of languages with a limited depth of left subordination G2 , grammar
with limited memory, equivalent to grammars G3 and close to them in the construc-
tion of conclusions, ie arranged much easier than arbitrary grammars G2 , are not only
fundamentally sufficient, but also very convenient - they provide natural description.
    There are, however, languages in which not only the right but also the left sequen-
tial subordination have unlimited depth. A similar language is, for example, Hungari-
an, where unrestricted left subordination comes from the prepositional common defi-
nitions, and unlimited right subordination at the expense of, for example, the subordi-
nate clauses of which (The house that Jack built) [14-16]. See an example from the
novel by G. Feher is a joking toast given in [68].
1. Kivánom, hogy valamint az agyag23 цlelх karjai22 közül kibontakozni21 akarу20
   kocsikerйk19 rettentх nyikorgбsбtуl18 megriadt17 juhбszkutya16 bundбjбba15
   kapaszkodу14 kullancs13 kidülledt fйlszemйbхl12 alбcseppent11 kцnnyeseppben10
   visszatьkrцzхdх9 holdvilág fйnyйtхl8 illuminбlt7 rablуlovagvбr6 felvonуhidjбbуl5
   kiбllу4 vasszegek3 kohйziуs erejйnek2 hatбsa1 évszézadokra összetartja annak
   materiáját, aképpen tartsa össze ezt а társaságot az igaz szeretet.
2. Я хочу, аби справжнє кохання скріпило цю компанію так, як на століття
   скріплює матеріал мосту дія1 єднальної сили2 цвяхів3, що торчать4 з
   підіймального мосту5 розбійницького феодального замку6, освященного7
   місячним світлом8, що відображається9 в краплині10, яка витікає11 з
   витріщеного ока12 клеща13, що вчепилася14 в шерсть15 вівчарки16,
   наполоханої17 жахливим скрипом18 возових колес19, що прагнуть20
   вирватися21 з обіймів22 грязюки23 [7, 10, 18-20, 32-33, 36, 45].
This phrase from an artistic text has a depth of 22 and is absolutely correct from a
grammatical point of view (to the same extent as its Ukrainian translation). Moreover,
nothing prevents the continuation of the ad libitum chain of definitions.
   To generate languages with this property, another special type of grammar G2 can
be offered, in some sense more general than the grammar G2 with limited memory
discussed above. First of all, let's state more precisely what languages we have in
mind here. These are languages in which an unlimited number of sequentially subor-
dinate structures from left to right X1X2…Xi… (unlimited right subordination) is pos-
sible, and in each of these structures Xi an unlimited left subordination is possible - a
sequence of structures …Xij…Xi3Xi2Xi1; however, unlimited Xij deployment is not
possible within the structures. With regard to the Hungarian language Xi, it can be
understood as a simple sentence, which is each (except the first) additional determina-
tive of the previous one, and Xij - as a prepositional participle.
   Consider a grammar Г   V ,V1, I , S  whose basic vocabulary V’ consists of n
symbols A1 , A2 ,..., An and whose rules have the form X  YAi or X  Ai , where X
and Y belong to V1 [14-16]. Let us put in accordance with each of the symbols Ai
some regular grammar Г i  V ,V1 , Ai , Si , where V is the main vocabulary, common
                                   i


to all Г i , V1i is the auxiliary vocabulary, which does not contain any characters with
V’ and V1 except Ai; Ai is initial symbol; the rules of the Si scheme are either
C  dD or C  c (here, as in the other examples, capital letters are denoted by
auxiliary characters, and lowercase characters are capitalized). In this case, we as-
sume that the grammar Г i auxiliary dictionaries do not intersect in pairs.
   The grammar Г  is very close to the automaton, differing from it only by the di-
rection of unfolding (the direction of unfolding here refers to the direction in which
"terminal" characters, such as C  dD - the left unfolding) are generated; in fact, it
is automatic with a precision to mirror symmetry. So we are dealing with one quasi-
regular right-deployment grammar and regular left-deployment grammar.
   Consider now the union of all these grammars, more precisely, the grammar Г in
which the main vocabulary is V (the same as all Г i ), the auxiliary dictionary -
V1  V   V1 V11  V12  ...  V1n (i.e. the union of auxiliary dictionaries of all
grammars Г  Г 1 Г 2 , …, Г n , and the basic grammar dictionary Г  ), the initial
                       .        .
symbol I (same as Г  ), and the scheme is a combination of the schemes of all gram-
mars Г  Г 1 Г 2 ,.…, Г n .. This grammar Г is a special context-free grammar that can
be called context-free grammar with independent bilateral deployment. The fact that
this grammar is not automatic is obvious, at least because some of its rules (schema
rules S ) have two auxiliary characters in the right-hand side. The basic grammar Г 
symbols (i.e. A1 , A2 ,..., An ) in the grammar Г are auxiliary, so the rules of appearance
X  YAi within Г are not "automatic". But grammar Г is equivalent to automatic.
Here is an example (schema) of such grammar.

                                                     A3  aP3
                                                     A  bQ
                                                     3       3

          I  BA1                                   A3  cR3
          B  CA                                   
                                                    P3  a
                                               S3  
                   1

         C  BA2      1
                        A   bP                     Q3  b
    S  
                               1
                                                    R3  dR3
         C  DA3 S   P1  aQ1                    
          D  DA4 1 Q1  aQ1                       R3  eR3 S   A4  cP4
                     
                                                    R  d         
          D  A2    Q1  c   S2   A2  d                      P4  b
                                                                4
                                                     3

The grammar introduced by the Gladkiy type works like this. Initially, the generated
chain is infinitely unfolded from left to right by symbols Ai (which can be interpreted,
for example, as syntax groups or sentences Si); this is done by the rules S’. Then any
one of the rules Ai can be expanded indefinitely from right to left into a chain of ter-
minal characters (which can be interpreted as words). Such a process of generation is
convenient in such cases, for example, as the Hungarian phrases of the type discussed
above.
    Theorem 2. Each context-free grammar with independent bilateral deployment is
equivalent to some regular grammar [14-16].
    Unlimited grammars of type 0 are only a special case of the general concept of
grammar. However, they are certainly sufficient to describe all natural languages in
their entirety. Any natural language (set of correct phrases) is an easily recognizable
set. This means that there is a fairly straightforward phrase recognition algorithm. If
the language is recognized by an algorithm with the specified memory limit, then it
can be generated by grammar, where for any terminal chain of the output length there
is an output in which no intermediate chain exceeds the length of the number Kn (K -
is some constant). Such grammar is a grammar with limited stretching, where the
capacitive signal function is no more linear. For any grammar with limited stretching,
it is possible to construct an equivalent grammar G0 that can describe many correct
phrases of any natural language, that is, to produce any correct phrases of the given
language, without generating any wrong ones. Both constructions, given as examples
of the inapplicability of context-free grammars, are easily described by grammar G0 .
    The disadvantages of the method of grammar G0 deduction are reduced to three
points.
1. It is not possible to naturally describe phrases with discontinuous components.
2. Grammar contains only rules for the formation of linguistic expressions, such as
   word forms or phrases. Grammar sets the correct expressions as opposed to the
   wrong ones.
3. Grammar G0 builds sentences at once with exactly the same order of words - with
   what those sentences should be in their final form. This generates a syntactic
   structure in the form of an ordered tree, that is, a tree, where, in addition to the
   subordination relation given by the tree itself, there is also a linear order relation
   (to the right - to the left). Thus, the syntactic structure of grammar G0 does not
  break down two completely different in nature, although related: syntax
  subordination and linear interposition. But to characterize the syntactic structure is
  to specify the relation of syntactic subordination. As for the linear order relation, it
  characterizes not the structure but the phrase itself. The order of words depends on
  the syntactic structure; it is determined necessarily by its account and thus is in
  relation to it something derivative, secondary. It is advisable to modify the concept
  of generating grammar so that the left and right parts of the substitution rules are
  not linearly ordered chains, but, for example, trees (without linear ordering)
  depicting syntactic relations [14 - 16]. Then the rules look like this:


                                                                        or


Index bars represent syntactic links of different types; letters A,B,C,…- are syntactic
categories. NB: the relative arrangement of characters of one level of subordination
                                                 A
                                             x       y
does not play any role and is accidental B           C in this scheme; means the same
         A
     y       x
as C         B [14-16].
   The result is a computation of the syntactic structures (not phrases) of the lan-
guage. This computation is part of the generating grammar. The other part of this
grammar is the computation that, for any given syntactic structure, specifies (taking
into account any other factors, such as in the Ukrainian language - with the mandatory
accounting of logical highlighting, etc.) all possible linear sequences of words for it.
Then the problem of discontinuous components is removed. It is impossible to get a
natural representation of the structure of the immediate components of that sentence
from the regular grammar. That is, regular grammars give some structure to constitu-
ents, as in general all grammars of direct constituents, however, these constituents are
usually formal. С1 is different content from different information resources. Text
content С 2 (article, commentary, book, etc.) from С1 contains a considerable amount
of data in natural language, some of which is abstract. The text is presented as a uni-
fied sequence of character units whose main properties are information, structural and
communicative connectivity / integrity, which reflects the content / structure of the
text. Linguistic content analysis (such as comments, forums, articles, etc.) is a method
of word processing. The text processing process divides the content into tokens using
finite state machines. As a functional-semantic-structural unity, the text has rules of
construction, reveals patterns of meaningful and formal connection of constituent
units. Cohesiveness is manifested through external structural indicators and formal
dependence of the text components, and integrity through thematic, conceptual and
modal dependence. Integrity leads to a meaningful and communicative organization
of text, and coherence to a form, a structural organization. Commercial Content Key-
word Detection Operator α : С2 ,U K , T   С3 is a mapping of commercial content
 С 2 to a new state that is different from the previous state by having a plurality of
keywords that generally describe its content. The analysis investigates the multilevel
content structure: linear sequence of characters; linear sequence of morphological
structures; linear sequence of sentences; network of interconnected unities (alg. 1).
The analysis explores the multilevel structure of textual content: linear sequence of
characters; linear sequence of morphological structures; linear sequence of sentences;
network of interconnected unities (alg. 1).
    The analysis explores the multilevel structure of textual content: linear sequence of
characters; linear sequence of morphological structures; linear sequence of sentences;
network of interconnected unities (alg. 1).

  Algorithm 1. Linguistic analysis of textual commercial content.

Section 1: Grammar analysis of textual content С 2 .
Step 1. Divide textual commercial content С 2 into sentences and paragraphs.
Step 2. Divide the content character chain С 2 into words.
Step 3. Allocate numbers, numbers, dates, unchanged turns, and content cuts С 2 .
Step 4. Remove non-text content characters С 2 .
Step 5. Formation and analysis of linear sequence of words with service marks for
content С 2 (alg. 3).
Section 2: Morphological analysis of textual content С 2 .
Step 1. Getting the basics (word forms with severed endings).
Step 2. A grammatical category is formed for each wordform (collection of grammati-
cal meanings: genus, case, pronunciation, etc.).
Step 3. Formation of linear sequence of morphological structures.
Section 3: Syntax analysis α : С2 ,U K , T   С3 of textual content С2 (alg. 2).
Section 4: Semantic analysis of textual content С 3 .
Step 1. Words correlate with semantic vocabulary classes.
Step 2. Selection of morphosemantic alternatives needed for this sentence.
Step 3: Bind the words into a single structure.
Step 4. Generate an ordered set of superposition entries from basic lexical functions
and semantic classes. The accuracy of the result is determined by the completeness /
correctness of the dictionary.
Section 5: Reference analysis for the formation of interphase unities.
Step 1. Contextual analysis of text commercial content С 3 . With its help, the resolu-
tion of local references (the one that is, his) is realized and the expression of the ex-
pression is the kernel of unity.
Step 2. Thematic analysis. Separation of statements on a theme and rheum distin-
guishes thematic structures which are used, for example, in the formation of a digest.
Step 3. Determine the regular repetition, synonymization and re-nomination of key-
words; the identity of the reference, that is, the ratio of words to the subject of the
image; presence of implication based on situational connections.
Section 6: Structural analysis of textual content С 3 . The prerequisites for use are a
high degree of coincidence of terms of unity, a discursive unit, a sentence in a seman-
tic language, utterance, and an elementary discursive unit.
Step 1: Identify the basic set of rhetorical connections between content unities.
Step 2. Building a nonlinear unity network. The openness of a link set involves its
extension and adaptation to analyze the structure of the text .

Parsers work in two stages: identify meaningful tokens and create a parse tree (alg. 2).
The text implements the structured activity, which involves the subject and object,
process, purpose, means and result, which are reflected in the content-structural, func-
tional, communicative indicators. The units of internal organization of the structure of
the text are the alphabet, vocabulary (paradigm), grammar (syntagmatics), paradigms,
paradigmatic relations, syntagmatic relations, rules of identification, expression, be-
tween phrase unity and fragments-blocks. At the compositional level, there are sen-
tences, paragraphs, paragraphs, sections, chapters, chapters, pages, etc. that, besides
sentences, are indirectly related to the internal structure, so are not considered. They
use the database (terms / morphemes database and official parts of the language) and
defined text analysis rules to search for a term. Parsers work in two stages: identify
meaningful tokens and create a parse tree (alg. 2).

  Algorithm 2. Commercial Content Syntax.

Section 1: Identification of content tokens U K1 U K for commercial content С 2 .
Step 1. Define the term chain as a sentence.
Step 2. Identify the group name using the basics dictionary.
Step 3. Identify a verb group using the basics dictionary.
Section 2: Create a parse tree from left to right. The output of a tree is to expand one
of the characters in the previous string of a sequence of linguistic variables, or to
replace it with another, the other characters are overwritten without change. On de-
ployment, the replaceable / rewritable characters (ancestors) connect directly to the
characters that result from the deployment, replacement, or rewriting (descendants),
and receive a component tree, or syntax, for commercial content.
Step 1. Deploying a named group. Deploying a verb group.
Step 2. Implementation of syntactic categories with word forms.
Section 3: Determine the plurality of content keywords α : С2 ,U K , T   С3 for С 2 .
Step 1. Define the terms NounU K1 is nouns, noun phrases, or noun adjectives
among the plural words of textual content.
Step 2. Calculation of Unicity uniqueness for terms NounU K1 .
Step 3. Calculation NumbSymbU K 3 (number of characters without spaces) for
 NounU K1 approx for Unicity .
Step 4. Calculation UseFrequency U K 2 is frequency of occurrence of content key-
words. For term NumbSymb  2000 the frequency UseFrequency is within 6;8 %,
from NumbSymb  3000 is 2;4 %, from 2000  NumbSymb  3000 is 4;6 %.
Step 5. Calculation - frequency of occurrence of keywords at the beginning of text,
 IUseFrequency - frequency of occurrence of keywords in the middle of text,
 EUseFrequency - keywords occurrence frequency at the end of text of content.
Step 6. Compare values BUseFrequency , IUseFrequency and EUseFrequency for
prioritization. Higher-value keywords BUseFrequency have higher priority than
higher-value keywords.
Step 7. Sort your keywords according to their priorities.
Section 4: Fill in the content search engine base С 3 , that is attributes
 KeyWords U K 4 is keywords, Unicity is keyword uniqueness  80 , Noun is term,
 NumbSymb is number of characters without spaces, UseFrequency is frequency of
keywords, BUseFrequency is frequency of keywords at the beginning of text,
 IUseFrequency is frequency of keywords in the middle of the text, is the frequency of
keywords used at the end of the text.

Detecting commercial content С 2 keywords from a text snippet is performed using
the processes shown in Figure 1. The text implements structurally submitted activity
that involves the subject and object, process, purpose, means and result, which are
reflected in the content-structural, functional, communicative indicators. The units of
internal organization of the structure of the text are the alphabet, vocabulary (para-
digm), grammar (syntagmatics), paradigms, paradigmatic relations, syntagmatic rela-
tions, rules of identification, expression, between phrase unity and fragments-blocks.
At the compositional level, there are sentences, paragraphs, paragraphs, sections,
chapters, chapters, pages, etc. that, besides sentences, are indirectly related to the
internal structure, so are not considered. They use the database (terms / morphemes
database and official parts of the language) and defined text analysis rules to search
for a term. Based on the rules of generative grammar, the term is adjusted according
to the rules of its use in context. The sentences set the boundaries of punctuation,
anaphoric, and cataphoric references. The semantics of the text are caused by the
communicative task of transmitting data. The structure of the text is determined by
the internal organization of the text units and the patterns of their interrelation.
Through parsing, the text is framed into a data structure, for example, into a tree that
matches the syntactic structure of the input sequence, and is best suited for further
processing. After analyzing a snippet of text and a term, they synthesize a new term as
a content topic keyword, using a base of terms and their morphemes. Next, we syn-
thesize terms to form a new keyword using the base of the official parts of the lan-
guage. The term keyword detection principle is based on Zipf's law and comes down
to medium-frequency word selection (most used words are ignored through stop dic-
tionaries and rare words are ignored).


                Fig. 1. Use case diagram for the keyword discovery process

The content content analysis is responsible for the process of extracting grammatical
data from a word through grapheme analysis and correcting the results of morpholog-
ical analysis through analyzing the grammatical context of linguistic units (alg. 3).

  Algorithm 3. Rubrication of text commercial content

Section 1: Divide the commercial content С3 into blocks.
Step 1. Submission of commercial content blocks to the tree-building input С3 .
Step 2. Create a new block in the block table.
Step 3. Accumulate characters to a newline character.
Step 4. Check for a period before the newline character. If so, go to step 5, if not, save
the sequence to the table, parse the new content block С3 , and go to step 3.
Step 5: Check the end of the text for content С3 . If the end of the text, then the transi-
tion to step 6, if not, stores the cached sequence in the table, parsing the new content
block С3 and the transition to step 2.
                                                        B
Step 6. Retrieve the content С3 block tree as a table U CT  U CT .
Section 2: Divide the block into sentences with the content structure preserved С3 .
                                            B
Step 1. A block table is fed to the input U CT  U CT . Creating a sentence table
  R
U CT  U CT with a link in the n_to-1 partition_code field with a content block table
С3 .
Step 2. Create a new sentence in the sentence table U CTR
                                                             U CT .
Step 3. Accumulate a semicolon, semicolon, or newline character.
Step 4. Check for reduction. If it is an abbreviation, then go to step 5, if not, save the
sequence in the table, parse the new sentence, and go to step 2.
Step 5. Check the content of the block text for content. If the end of the text, then go
to step 6, if not, save the sequence in the table U CTR
                                                          U CT , parse the new sentence
and go to step 2.
                                                          R
Step 6. Get the output of the sentence tree as a table U CT   U CT .
Step 7: Check the end of text for content С3 . If the end of the text, then go to step 8,
if not, parse the new block and go to step 1.
                                                                          R
Step 8. Getting the output of a tree of sentences in the form of tables U CT  U CT .
Section 3: Divide the sentences into tokens, indicating the belonging to the sentences
   L
U CT   U CT .
Step 1. Formation based on the sentence table of the lexemes table U CT  L
                                                                            U CT with
the fields Codex (unique identifier), Sentence code (number equal to the code of the
sentence with the token), Numberx (number equal to the number of the tokens in the
sentence), Text (text of the tokens).
                                                                      R
Step 2: Log in to parse the sentence tokens from the sentence table U CT  U CT .
                                                 L
Step 3. Create a new token in the token table U CT  U CT .
Step 4. Accumulate characters to a point, a space, or the end of a sentence and save it
in the token table.
Step 5. Check the end of the sentence. If so, go to step 6, if not, save the accumulated
                         L
sequence to the table U CT  U CT , parse the new tokens, and go to step 3.
Step 6. Performing syntax analysis based on raw data (alg. 2).
Step 7. Conduct morphological analysis based on the output data.
Section 4: Identify the topic of commercial content U CTT
                                                            U CT .
                                                           T
Step 1. Build a hierarchical structure of the properties U CT  U CT of each lexical unit
of text containing grammatical and semantic information.
Step 2. Formation of a lexicon with hierarchical organization of types of properties,
where each type-descendant inherits and redefines the properties of the ancestor.
Step 3. Unification is the basic mechanism for constructing a syntactic structure.
Step 4. Definition of keywords                KeyWords of commercial content
 С4  α5 (α 4 (C2 ,U K ),UCT ) at UCT  {UCT1,UCT 2 ,UCT 3 ,UCT 4 } , where UCT is collec-
tion of terms of rubric, U CT1 is set of thematic keywords from the dictionary, U CT 2
is set of frequencies of usage of keywords in commercial content, U CT 3 is set of de-
pendencies of use of keywords of different subjects (coefficients are determined by
the moderator according to the keyword to specific topics within [0,1] ), U CT 4 is the
set of frequencies of usage of content keywords in content. (alg. 2).
Step 5. Definitions of U Ct   T
                                  U Ct with TKeyWords is themed keywords plural for
           ,
 KeyWords Topic    is content  topic and Category - content category.
Step 6: Determine FKeyWords is frequency of Keyword Usage and QuantitativeryTKey -
Frequency of Usage of Topical Keywords in Commercial Content.
Step 7. Definition Comparison is comparison of occurrence of keywords of different
topics Calculation CofKeyWords is coefficient of thematic content keywords, Static is
coefficient of statistical importance of terms, Addterm is coefficient of availability of
additional terms. Comparison of multiple content keywords with key topic concepts,
if there is a match, then go to step 9, if not, move to step 8.
Step 8. Formation of a new heading with a set of key concepts of the analyzed С4 .
Step 9. Assign a specific rubric to the analyzed commercial content С4 .
Step 10. Calculation is the coefficient of content С4 placement in the topic heading.
Section 5: Filling in the search engine base for attributes Topic is content topic,
 Category is content category, Location is content placement coefficient in the con-
tent column, CofKeyWords is content keyword content coefficient, - statistical signifi-
cance of terms, Static is coefficient of availability of additional terms, TKeyWords is
topics of availability of additional terms, FKeyWords is frequency of use of keywords,
 Comparison is comparison of occurrence of keywords of different subjects,
QuantitativeryTKey is frequency of use of thematic keywords in the text of content С4 .

The construction of the content С4 text is determined by the theme, the expressed
information, the terms of communication, the task of the message and the style of
presentation. The semantic, grammatical and compositional structure С4 of the con-
tent is related to its stylistic / stylistic characteristics, which depend on the identity of
the author and are subordinate to the thematic / stylistic dominance of the text. The
process of С4 categorization in the form of a variant diagram is shown in Fig. 2. The
main stages of determining the morphological features UCT of the units of the text
 С4 : the definition of grammatical classes of words - parts of language and principles
of their classification; isolation of a part of word semantics as morphological, substan-
tiation of a set of morphological categories and their nature; a description of the set of
formal means assigned to parts of language and their morphological categories. The
process of heading С4   (α(C2 ,U K ),U CT ) through the automatic indexing of the
components of commercial content С3 is divided into successive blocks: morpholog-
ical analysis, syntactic analysis, semantic-syntactic analysis of linguistic constructions
and variation of the content record of textual content.


        Fig. 2. A diagram of the use cases for the content heading process in the SEC

The following grammatical meanings have been used: synthetic, analytical, analytical,
synthetic, and subitive. The grammatical meanings are generalized because of the
same characteristics and can be divided into partial meanings. The concept of gram-
matical category was used to refer to classes of the same grammatical meanings.
Morphological values include the categories of genus, number, case, person, time,
method, condition, species, combined into paradigms for classifying parts of a text.
The object of morphological analysis is the structure of the word, the forms of word
exchange, the ways of expressing grammatical meanings. Morphological features of
units of text are tools for exploring the connection between vocabulary, grammar,
their use in speech, paradigmatics (distinct forms of declining words), and syntagmat-
ic (linear conjunctions of words, conjunctions). The implementation of the automatic
encoding of text words, that is, the attribution of grammatical class codes, is associat-
ed with grammatical classification. Morphological analysis contains the following
steps: selection of the basis in word form; search for the basics in the basics diction-
ary; comparison of word structure with data in dictionaries of basics, roots, prefixes,
suffixes, flexions. In the analysis process, the meanings of words and the syntagmatic
relationships between content words are identified. The tools of analysis are the dic-
tionaries of basics / flexions / homonyms and statistical / syntactic word combina-
tions, the removal of lexical homonymy, semantic analysis of nouns, the semantic-
syntactic combination of nouns / adjectives and components of adverbials, algorithms
for the analysis of algorithms ; system of division of words of the text on a flexion
and basis; equivalence thesaurus for replacing equivalent words with one / more con-
cept numbers that serve as content identifiers instead of word bases; a thesaurus in the
form of a hierarchy of concepts to provide a search for a given general / associated
concept; vocabulary service system. The indexing process depends on the descriptor
dictionary or the information retrieval thesaurus. The descriptor dictionary has the
structure of a table with three columns: the basics of words; sets of descriptors at-
tributed to each basis; grammatical features of descriptors. Indexing consists of high-
lighting informative phrases from text; decoding the abbreviation; replacement of
words with basic descriptors with the descriptor code; withdrawal of homonymy.


5      Conclusions

The article discusses known methods and approaches to addressing the automatic
processing of textual content and highlights the shortcomings and benefits of existing
approaches and results in the syntactic aspects of computational linguistics. General-
ized conceptual principles of modeling of word-exchange processes in the formation
of text arrays on the example of Ukrainian and German sentences, and then, propos-
ing syntactic models and word-classifications of the lexical composition of Ukrainian
and German sentences, developed lexicographic rules of syntactic type for automated
processing. The application of the technique allows achieving higher reliability indi-
cators in comparison with the known analogues, as well as demonstrating high effi-
ciency in applied applications in the construction of new information technologies of
lexicography and the study of the word-exchange effects of natural languages. The
work is of practical value, since the proposed models and rules make it possible to
effectively organize the process of creating lexicographic systems for processing syn-
tactic textual content.


References
 1. Angliyskaya       grammatika        v     dostupnom       izlozhenii. In:     http://real-
    english.ru/crash/lesson3.htm, last accessed 2019/11/21.
 2. Anisimov, A.V., Marchenko, O.O., Nykonenko, A.O.: Alhorytmichna modelʹ asotsiatyv-
    no-semantychnoho kontekstnoho analizu tekstiv pryrodnoyu movoyu. In: Probl.
    Prohramuv, 2-3, 379-384. (2008)
 3. Anisimov, A.V.: Komp'yuternaya lingvistika dlya vsekh: mify, algoritmy, yazyk. Dumka,
    Kyiv, Ukraine (1991)
 4. Apresyan, Y.D.: Idei i metody sovremennoy strukturnoy lingvistiki. Prosveshcheniye,
    Moscow (1966)
 5. Apresyan,       Y.D.:       Neposredstvenno        sostavlyayushchikh     metod.       In:
    http://tapemark.narod.ru/les/332a.html, last accessed 2019/11/21.
 6. Arsent'yeva, N.G.: O dvukh sposobakh porozhdeniya predlozheniy russkogo yazyka. In:
    Problemy kibernetiki, 14, 189-218. (1965)
 7. Bahmut, A.Y.: Poryadok sliv. In: Ukrayinsʹka mova: Entsykl, 3, 675-676. (2007)
 8. Bil'gayeva, N.T.: Teoriya algoritmov, formal'nykh yazykov, grammatik i avtomatov.
    VSGTU, Ulan-Ude (2000)
 9. Bolshakova, Y.I., Klyshinskiy, E.S., Lande, D.V., Noskov, A.A., Peskova, O.V., Yaguno-
    va, Ye.V.: Avtomaticheskaya obrabotka tekstov na yestestvennom yazyke i komp'yuterna-
    ya lingvistika. MIEM, Moscow (2011)
10. Vysotska, V.: Linguistic Analysis of Textual Commercial Content for Information Re-
    sources Processing. In: Modern Problems of Radio Engineering, Telecommunications and
    Computer Science, TCSET’2016, 709–713 (2016)
11. Volkova, I.A., Rudenko ,T.V.: Formal'nyye grammatiki i yazyki. Elementy teorii trans-
    lyatsii. In: Izdatel'skiy otdel fakul'teta vychisli-tel'noy matematiki i kibernetiki MGU im.
    M.V.Lomonosova, (1999)
12. Hakman, O.V.: Heneratyvno-transformatsiyna linhvistyka N. Khomsʹkoho yak vyra-
    zhennya yoho linhvistychnoyi filosofiyi. In: Mulʹtyversum. Filosofsʹkyy alʹma-nakh, 45,
    98-114. (2005)
13. Gerasimov, A.S.: Lektsii po teorii formal'nykh yazykov. In: http://gas-
    teach.narod.ru/au/tfl/tfl01.pdf, last accessed 2019/11/21.
14. Gladkiy, A.V.: Sintaksicheskiye struktury yestestvennogo yazyka v avtomatizirovan-nykh
    sistemakh obshcheniya. Nauka, Moscow (1985)
15. Gladkiy, A.V. Mel’chuk, I. A.: Elementy matematicheskoy lingvistiki. In: Nauka, Moscow
    (1969)
16. Gladkiy, A.V.: Formal’nyye grammatiki i yazyki. In: Nauka, Moscow (1973)
17. Gross, M., Lanten, A.: Teoriya formal'nykh Grammatik. Mir, Moscow (1971)
18. Darchuk, N.P.: Komp’yuterna linhvistyka (avtomatychne opratsyuvannya tekstu). In: Ky-
    yivs’kyy universytet VPTS, Kyiv, Ukraine (2008)
19. Demeshko, I.: Typolohiya morfonolohichnykh modeley u viddiyeslivnomu slovotvorenni
    suchasnoyi ukrayins’koyi movy. In: Zbirnyk naukovykh prats’ Linhvistychni studiyi, 19,
    162-167. (2009)
20. Zubkov, M.: Ukrayins’ka mova: Universal’nyy dovidnyk. Shkola, Kyiv, Ukraine (2004)
21. Ingve. V.: Gipoteza glubiny. In: Novoye v lingvistike, IV, 126-138. (1965)
22. Lyubchenko, T.P.: Leksykohrafichni systemy hramatychnoho typu ta yikh zastosuvannya
    v za-sobakh avtomatyzovanoho opratsyuvannya movy. In: Avtoref. dys. kand. tekhn. nauk:
    spets. 10.02.21, Kyiv, Ukraine (2011)
23. Martynenko, B.K.: Yazyki i translyatsii: Ucheb. Posobiye. In: Izd. 2-ye, ispr. i dop. – SPb.:
    Izd-vo S.-Peterb. un-ta, (2008)
24. Marchenko, O.O.: Alhorytmy semantychnoho analizu pryrodnomovnykh tekstiv. In:
    Avtoref. dys. na zdobuttya nauk. stupenya kand. fiz.-mat. nauk: spets. 01.05.01. Kyiv,
    Ukraine (2005)
25. Noskov, S.A.: Samouchitel' nemetskogo yazyka. Nauka, Kyiv, Ukraine (1999)
26. Paducheva, Y.V.: O svyazyakh glubiny po Ingve so strukturoy dereva pochineniy. In:
    Nauchno-tekhnicheskaya informatsiya, 6, 38-43. (1967)
27. Partyko, Z.V.: Prykladna i komp'yuterna linhvistyka. In: Afisha, Lviv, Ukraine (2008)
28. Pentus, A.Y., Pentus, M.R.: Teoriya formal'nykh yazykov: Uchebnoye posobiye. In: Izd-
    vo TSPI pri mekhaniko-matematicheskom f-te MGU, Moscow (2004)
29. Popov, E.V.: Obshcheniye s EVM na yestestvennom yazyke. Nauka, Moscow (1982)
30. Postnikova, O.M.: Nimetsʹka mova. Rozmovni temy: leksyka, teksty, dialohy, vpravy. T.
    1, A.S.K, Kyiv, Ukraine (2001)
31. Postnikova, O.M.: Nimetsʹka mova. Rozmovni temy: leksyka, teksty, dialohy, vpravy. T.
    2, A.S.K, Kyiv, Ukraine (2001)
32. Potapova, H.M.: Morfonolohiya viddiyeslivnoho slovotvorennya (na materiali slo-
    votvirnykh hnizd z vershynamy - diyeslovamy ta viddiyeslivnykh slovotvirnykh zon). In:
    Dys. kand. nauk: 10.02.02, Ukraine (2008).
33. Rusachenko, N.P.: Morfonolohichni protsesy u slovozmini ta slovotvori
    staroukrayins’koyi movy druhoyi polovyny XVI – XVIII st. In: Avtoreferat dysertatsiyi na
    zdobuttya naukovoho stupenya kandydata filolohichnykh nauk, http://auteur.corneille-
    moliere.com/?p=history&m=corneille_moliere&l=rus, last accessed 2019/11/21.
34. Torosyan, O.M.: Funktsionalʹni kharakterystyky pryslivnykiv miry ta stupenya v suchasniy
    anhliysʹkiy movi. In: avtoref. dys. na zdobuttya nauk. stupenya kand. filol. na-uk,
    http://disser.com.ua/contents/6712.html, last accessed 2019/11/21.
35. Turysheva, O.O.: Porushennya ramkovoyi konstruktsiyi v suchasniy nimetsʹkiy movi:
    funktsionalʹnyy aspekt, normatyvnyy status. In: Avtoref. dys. kand. filol. nauk: spets.
    10.02.04 (2012)
36. Ukrayins’kyy pravopys. In: In-t movoznavstva im. O.O. Potebni NAN Ukrayiny, In-t ukr.
    movy NAN Ukrayiny, Nauk. dumka, Kyiv, Ukraine (2007)
37. Fomichev,       V.S.:    Formal'nyye      yazyki,     grammatiki      i    avtomaty.     In:
    http://www.proklondike.com/books/thproch/, last accessed 2019/11/21.
38. Chomsky, N.: O nekotornykh formal'nykh svoystvakh grammatik. In: Kiberneticheskiy
    sbornik, 5, 279-311. (1962)
39. Chomsky, N., Miller, G. A.: Formal'nyy analiz yestestvennykh yazykov. In: Kibernetich-
    eskiy sbornik, 1, 231-290. (1965)
40. Chomsky, N.: Yazyk i myshleniye. In:Publikatsii OSiPL. Seriya monografiy, 2. (1972)
41. Chomsky, N.: Sintaksicheskiye struktury. In: Sbornik Novoye v lingvistike, 2, 412-527.
    (1962)
42. Chepurna, Z.V.: Transformatsiya poryadku sliv u prostomu rechenni pry perekladi z
    nimetsʹkoyi movy ukrayinsʹkoyu. In: Naukovi zapysky, 89(1), 232-236. (2010)
43. Sharov, S.A.: Sredstva komp'yuternogo predstavleniya lingvisticheskoy informatsii. In:
    http://www.ksu.ru/eng/science/ittc/vol000/002/, last accessed 2019/11/21.
44. Lytvyn, V., Sharonova, N., Hamon, T., Cherednichenko, O., Grabar, N., Kowalska-
    Styczen, A., Vysotska, V.: Preface: Computational Linguistics and Intelligent Systems
    (COLINS-2019). In: CEUR Workshop Proceedings, Vol-2362. (2019)
45. Shulʹzhuk, K.: Syntaksys ukrayinsʹkoyi movy. In: Akademiya, Kyiv, Ukraine (2004)
46. Babichev, S.: An Evaluation of the Information Technology of Gene Expression Profiles
    Processing Stability for Different Levels of Noise Components. In: Data, 3 (4), art. no. 48
    doi: 10.3390/data3040048 (2018)
47. Babichev, S., Durnyak, B., Pikh, I., Senkivskyy, V.: An Evaluation of the Objective Clus-
    tering Inductive Technology Effectiveness Implemented Using Density-Based and Ag-
    glomerative Hierarchical Clustering Algorithms. In: Advances in Intelligent Systems and
    Computing, 1020, 532-553 doi: 10.1007/978-3-030-26474-1_37 (2020)
48. Shreyder, Y.A.: Kharakteristiki slozhnosti struktury teksta. In: Nauchnotekhnicheskaya in-
    formatsiya, 7, 34-41. (1966)
49. Chomsky, N.: Three models for the description of language. In: I. R. E. Trans. PGIT 2,
    113-124. (1956)
50. Chomsky, N.: On certain formal properties of grammars, Information and Control 2. In: A
    note on phrase structure grammars, Information and Control 2, 137-267, 393-395. (1959)
51. Chomsky, N.: On the notion «Rule of Grammar». In: Proc. Symp. Applied Math., 12.
    Amer. Math. Soc. (1961)
52. Chomsky, N.: Context-free grammars and pushdown storage. In: Quarterly Progress Re-
    ports, 65, Research Laboratory of Electronics, M. I. T. (1962)
53. Chomsky, N.: Formal properties of grammars. In: Handbook of Mathemati-Mathematical
    Psychology, 2, ch. 12, Wiley, 323-418. (1963)
54. Chomsky, N.: The logical basis for linguistic theory. In: Int. Cong. Linguists, (1962)
55. Chomsky, N., Miller, G. A.: Finite state languages. In: Information and Control 1, 91-112.
    (1958)
56. Chomsky, N., Miller, G. A.: Introduction to the formal analysis of natural languages. In:
    Handbook of Mathematical Psychology 2, ch. 12, Wiley, 269-322. (1963)
57. Chomsky, N., Schützenberger M.P.: The algebraic theory of context-free languages. In:
    Computer programming and formal systems, North-Holland, Amsterdam, 118–161 (1963)
58. Bar-Hillel, Y., Shamir, E.: Finite state languages: formal representation and adequacy
    problems. In: Bulletin of the Research Council of Israel, 8F(3), 155-166. (1960)
59. Bobrow, D.G.: Syntactic analysis of English by computer – a survey. In: AFIPS confer-
    ence proceedings, 24, Baltimore, London, 365-387. (1963)
60. English     Verbs      (Part 1)     –    Basic   Terms.      In:   http://sites.google.com/si
    te/englishgrammarguide/Home/english-verbs--part-1----basic-terms, accessed 2019/11/21.
61. Hays, D.G.: Automatic language data processing. In: Computer applications in behavioral
    sciences, Englewood Cliffs, 394-421. (1962)
62. Postal, P.M.: Limitations of phrase structure grammars. In: The structure of language.
    Readings in the philosophy of language, Englewood Cliffs, 137-151. (1964)
63. Tesniere, L.: Elements de syntaxe structurale. (1959)
64. Tosh, L.W.: Syntactic translation. The Hague (1965)
65. Yngve, V.H.: A model and a hypothesis for language structure. In: Proceedings of Ameri-
    can phylosophical society, 104(5), 444-466. (1960)
66. Yngve, V.H.: Random generation of English sentences. In: Teddington (National physical
    laboratory, paper 6 (1961)
67. Su, J., Vysotska, V., Sachenko, A., Lytvyn, V., Burov, Y.: Information resources pro-
    cessing using linguistic analysis of textual content. In: Intelligent Data Acquisition and
    Advanced Computing Systems Technology and Applications, Romania, 573-578, (2017)
68. Varga, D.: Yngve’s hypothesis and some problems of the mechanical analysis. In: Compu-
    tational Linguistics, III, 47-74. (1964)
69. Khomytska, I., Teslyuk, V., Holovatyy, A., Morushko, O.: Development of methods, mod-
    els, and means for the author attribution of a text. In: Eastern-European Journal of Enter-
    prise Technologies, 3(2-93), 41–46. (2018)
70. Khomytska, I., Teslyuk, V.: Authorship and Style Attribution by Statistical Methods of
    Style Differentiation on the Phonological Level. In: Advances in Intelligent Systems and
    Computing III. AISC 871, Springer, 105–118, doi: 10.1007/978-3-030-01069-0_8 (2019)
71. Lytvyn V., Vysotska V., Pukach P., Nytrebych Z., Demkiv І., Kovalchuk R., Huzyk N.:
    Development of the linguometric method for automatic identification of the author of text
    content based on statistical analysis of language diversity coefficients, Eastern-European
    Journal of Enterprise Technologies, 5(2), 16-28 (2018)
72. Vysotska, V., Lytvyn, V., Hrendus, M., Kubinska, S., Brodyak, O.: Method of textual in-
    formation authorship analysis based on stylometry, 2018 IEEE 13th International Scien-
    tific and Technical Conference on Computer Sciences and Information Technologies,
    CSIT 2018, 9-16 (2018)
73. Lytvyn, V., Vysotska, V., Pukach, P., Nytrebych, Z., Demkiv, I., Senyk, A., Malanchuk,
    O., Sachenko, S., Kovalchuk, R., Huzyk, N.: Analysis of the developed quantitative meth-
    od for automatic attribution of scientific and technical text content written in Ukrainian,
    Eastern-European Journal of Enterprise Technologies, 6(2-96), pp. 19-31 (2018)
74. Vysotska, V., Burov, Y., Lytvyn, V., Demchuk, A.: Defining Author's Style for Plagiarism
    Detection in Academic Environment, Proceedings of the 2018 IEEE 2nd International
    Conference on Data Stream Mining and Processing, DSMP 2018, 128-133 (2018)
75. Vysotska, V., Fernandes, V.B., Lytvyn, V., Emmerich, M., Hrendus, M.: Method for De-
    termining Linguometric Coefficient Dynamics of Ukrainian Text Content Authorship, Ad-
    vances in Intelligent Systems and Computing, 871, 132-151 (2019)
76. Lytvyn, V., Vysotska, V., Burov, Y., Bobyk, I., Ohirko, O.: The linguometric approach for
    co-authoring author's style definition, Proceedings of the 2018 IEEE 4th International
    Symposium on Wireless Systems within the International Conferences on Intelligent Data
    Acquisition and Advanced Computing Systems, IDAACS-SWS 2018, 29-34 (2018)
77. Vysotska, V., Kanishcheva, O., Hlavcheva, Y.: Authorship Identification of the Scientific
    Text in Ukrainian with Using the Lingvometry Methods, 2018 IEEE 13th International
    Scientific and Technical Conference on Computer Sciences and Information Technologies,
    CSIT 2018 – Proceedings 2, 34-38 (2018)
78. Lytvyn V., Vysotska V., Peleshchak I., Basyuk T., Kovalchuk V., Kubinska S., Chyrun L.,
    Rusyn B., Pohreliuk L., Salo T.: Identifying Textual Content Based on Thematic Analysis
    of Similar Texts in Big Data. In: 2019 IEEE 14th International Scientific and Technical
    Conference on Computer Science and Information Nechnologies (CSIT), 84-91. (2019)
79. Vysotska V., Lytvyn V., Kovalchuk V., Kubinska S., Dilai M., Rusyn B., Pohreliuk L.,
    Chyrun L., Chyrun S., Brodyak O.: Method of Similar Textual Content Selection Based on
    Thematic Information Retrieval. In: 2019 IEEE 14th International Scientific and Technical
    Conference on Computer Science and Information Nechnologies (CSIT’2019), 1-6. (2019)
80. Cherednichenko, O., Babkova, N., Kanishcheva, O.: Complex Term Identification for
    Ukrainian Medical Texts. In: CEUR Workshop Proceedings, Vol-2255, 146-154. (2018)
81. Bobicev, V., Kanishcheva, O., Cherednichenko, O.: Sentiment Analysis in the Ukrainian
    and Russian News. In: First Ukraine Conference on Electrical and Computer Engineering
    (UKRCON), 1050-1055. (2017)
82. Fedushko, S., Benova, E.: Semantic analysis for information and communication threats
    detection of online service users. In: The 10th International Conference on Emerging
    Ubiquitous Systems and Pervasive Networks (EUSPN), 160, 254-259. (2019)
83. Antonyuk N., Chyrun L., Andrunyk V., Vasevych A., Chyrun S., Gozhyj A., Kalinina I.,
    Borzov Y.: Medical News Aggregation and Ranking of Taking into Account the User
    Needs. In: CEUR Workshop Proceedings, Vol-2362, 369-382. (2019)
84. Chyrun, L., Chyrun, L., Kis, Y., Rybak, L.: Automated Information System for Connection
    to the Access Point with Encryption WPA2 Enterprise. In: Lecture Notes in Computational
    Intelligence and Decision Making, 1020, 389-404. (2020)
85. Kis, Y., Chyrun, L., Tsymbaliak, T., Chyrun, L.: Development of System for Managers
    Relationship Management with Customers. In: Lecture Notes in Computational Intelli-
    gence and Decision Making, 1020, 405-421. (2020)
86. Chyrun, L., Kowalska-Styczen, A., Burov, Y., Berko, A., Vasevych, A., Pelekh, I.,
    Ryshkovets, Y.: Heterogeneous Data with Agreed Content Aggregation System Develop-
    ment. In: CEUR Workshop Proceedings, Vol-2386, 35-54. (2019)
87. Chyrun, L., Burov, Y., Rusyn, B., Pohreliuk, L., Oleshek, O., Gozhyj, A., Bobyk, I.: Web
    Resource Changes Monitoring System Development. In: CEUR Workshop Proceedings,
    Vol-2386, 255-273. (2019)
88. Gozhyj, A., Chyrun, L., Kowalska-Styczen, A., Lozynska, O.: Uniform Method of Opera-
    tive Content Management in Web Systems. In: CEUR Workshop Proceedings, Vol-2136,
    62-77. (2018)
89. Chyrun, L., Gozhyj, A., Yevseyeva, I., Dosyn, D., Tyhonov, V., Zakharchuk, M.: Web
    Content Monitoring System Development. In: CEUR Workshop Proceedings, Vol-2362,
    126-142. (2019)