<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Journal of
Information Engineering and Electronic Business 16(5) (2024) 1</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978-3-030-01069-0_8</article-id>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>CIAW-2024: Computational Intelligence Application Workshop</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>Stepan Bandera 12, 79013 Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>2</volume>
      <fpage>105</fpage>
      <lpage>118</lpage>
      <abstract>
        <p>The work aims to develop models, methods, and means of analysis and synthesis of computer linguistic systems (CLS) based on new and improved methods of processing Ukrainian-language textual content to solve natural language processing problems (NLP). The scientific novelty of the obtained results lies in solving an important scientific and applied problem of analysis and synthesis of CLS for solving various tasks of processing Ukrainian-language textual content based on developing new and improving known models, methods and means of NLP. The following new scientific results were obtained: - A model of intellectual analysis of the text flow, which, unlike the existing one, is based on the processing information resources, NLP and machine learning, which the typical structures of content integration, management and support modules; - Methods of adapted processing information resources for processing Ukrainianlanguage text and take into account the needs of the permanent target audience based on the analysis of the history of the target audience's activity on the CLS web resource, which made it possible to form a set of metrics and indicators of the effectiveness of the CLS functioning for the various NLP tasks solution; A model of linguistic processing of text based on the grapheme, morphological, lexical and syntactic analyses improvement, which, unlike the existing ones, are adapted for processing Ukrainian-language text through regular expressions and machine learning, made it possible to adapt the processes of processing Ukrainian-language text content and increase the accuracy of the obtained results depending from a specific NLP task; - A method of identifying keywords in Ukrainian-language texts based on grapheme and morphological analysis of word bases through regular expressions and N-grams was developed, which made it possible to increase the accuracy of searching for keywords, search for stable word combinations and categorize content; - A method of determining the style of the author of thematic Ukrainian-language text content was developed based on the keywords, stable word combinations, N-grams analysis, which made it possible to determine the stylistic contribution of each of the authors and increase the accuracy of the attribution of a scientific and technical publication; - A method was developed for calculating the degree of verification of the author of a Ukrainian-language text from a set of possible ones based on a comparative analysis of the styles of potential authors, which made it possible to increase the accuracy of classification based on the similarity of style; - Methods of analysis and synthesis of CLS were developed based on the creation of a general typical structure of the text content processing CLS in the Ukrainian language through support for modularity, modelling of the interaction of main processes and components, which made it possible to expand the collection of solutions to various typical tasks of the NLP by implementing typical software of such systems; - NLP methods, which, unlike the existing ones, are implemented on the basis of developed regular expressions of grapheme and morphological analysis of Ukrainian-language text and modified Porter's stemming algorithm as an effective identifying lem affixes for the possibility of demarcating the analysed word, which made it possible to optimize the process and improve the accuracy of Ukrainian words/sentences normalization; - Text tokenization and normalization methods, which, in contrast to the existing ones, use cascades of simple substitutions of developed regular expressions of matching with templates based on production rules, finite automata and the ontological model of the rules of the Ukrainian language syntax.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Computer linguistic systems</kwd>
        <kwd>NLP</kwd>
        <kwd>Ukrainian-language</kwd>
        <kwd>textual content</kwd>
        <kwd>machine learning 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The active development of information technologies (IT) is at the intersection of globalization and
informatization. The rapid rate of growth of society's informatization is directly related to the rate
of development and implementation of computer linguistic systems (CLS), the development of which
is based on models and methods of natural language processing (NLP) [1-3]. The complexity of
developing models, techniques, and tools of NLP lies in solving non-typical NLP problems and
adapting these models, methods, and tools to a specific natural language [4-6]. Each natural language
is unique, with its flavour of rules, history, grammar, exceptions, and peculiarities of generating
linguistic units for conveying meaning, complicating developing a CLS.</p>
      <p>Usually, each successful CLS development project is designed for a specific task (for example,
machine translation [7-9], identification of plagiarism/rewriting [10-12], text rubrication [13-14], text
attribution analysis [15-21], information retrieval [22-28], referencing/abstracting [29-30], voice
assistants [31-33], intelligent chatbots [34-39], etc.) and is both one-time and closed (for example,
Amazon Alexa, Google Assistant, Facebook, Voice Mate, Bixby, Siri, Abby Lingvo, Microsoft Cortana,
Microsoft Word, Grammarly, Google Translation, PROMT, CuneiForm, Trados, OmegaT, Wordfast,
Dragon, IBM via voice, Speereo, Finereader, Tesseract, OCRopus, etc.) without being able to read the
content to willing IT professionals/specialists. In rare cases, the developers provide open access to
such CLS projects and the opportunity to get acquainted with their structure and content. The
development of any NLP application for an arbitrary natural language of more than 7000 languages
and dialects is based on studying large textual monolingual/parallel corpora of that language,
containing more than hundreds of millions of words and linguistic resources. Only about 20 natural
languages (English, Chinese, Western European languages, Japanese, etc.) are the results of research
on such corpora known, making it possible to develop CLS of various complexity for these languages.
Unfortunately, in modern realities, the Ukrainian language is considered in the international
scientific community to be an exotic language with a low resource index, i.e., it does not have enough
educational, research and processed data to develop modern applied applications of NLP. Such
applied applications are used to build CLS in cyber security (detection of fakes and propaganda,
socalled trolls/bots in social networks), sociology (analysis of the dynamics of changes in public opinion
on thematic issues), philology (automatic research of large data sets of various thematic orientations
and different periods), psychology (analysis of the psychological portrait of a person, identification
of post-traumatic stress disorder of participants in hostilities or occupation), national security
(information warfare), jurisprudence (criminology and court case), social communications (analysis
of community posts in social networks) and other important branches of modern Ukraine. The above
determines the relevance of the topic of the dissertation research.</p>
      <p>Scientific research by N. Chomsky, V.M. Glushkov, A.V. Hladkoy, D.V. Lande, V.A. Shyrokov,
N.V. Sharonova, N.F. Khairova, O.V. Bisikalo, S.N. Buk, N.P. Darchuk, Z.V. Partyka, A.V. Anisimova,
Yu.D. Apresyan, O.O. Marchenko, I.M. Kulchytskyi, A.O. Nikonenko, M. Gross, A. Lanten, V.H.
Yngve, S. Sharoff, Yu.A. Schrader, D. Jurafsky, B. Bengfort, J.H. Martin, L. Tesniere, T. Ojeda, P.M.
Postal, D.G. Hays, T.A. van Dijk, S. Marcus, J. Lyons, L.W. Tosh, Y. Bar-Hillel, D.G. Bobrow, G. Lakoff,
R. Bilbro, N. Kotsyba, A.Yu. Berko, Yu.M. Shcherbyna, V.Yu. Velychko, V.F. Starko and many others
make it possible to understand the basic principles of linguistic processing of the text depending on
the features of a specific natural language. More than 80% of such studies concern the processing of
English-language texts. There are fewer studies on Slavic languages, particularly the low-resource
Ukrainian language. In particular, there are no publications regarding the development
recommendations, functional requirements, general structure, or typical architecture of the CLS for
processing Ukrainian-language textual content. Directly applying the English language's models,
methods, algorithms, and IT processing to Ukrainian-language textual content does not yield positive
results. Already at the level of morphological analysis, a significant conflict arises between the
methods developed for the English-language text and their use for the Ukrainian-language text. For
example, for a simple Porter algorithm (stemming) without appropriate modification, it is not correct
to separate the base of the word from the inflexion, which leads to inaccurate identification of key
phrases, which, in turn, affects the solution of any NLP problem where it is necessary to quickly
identify set of keywords (categorization, search, annotation, etc.). Determining the main features and
processes of linguistic analysis of Ukrainian-language texts will significantly facilitate the stages of
processing the text flow of information, such as integration, support and content management. In
turn, the adaptation of the processes of intellectual analysis of text content with the identification of
functional requirements for the relevant modules of the CLS will lead to the possibility of developing
its typical architecture based on the principle of modularity (adding components depending on the
content of the NLP task and the purpose of the CLS).</p>
      <p>The above testifies to the relevance of research in solving the significant scientific and applied
problem of analysis and synthesis of CLS for solving various tasks of processing Ukrainian-language
textual content, which will make it possible to increase the level of resourcefulness of the natural
Ukrainian language based on the development of new and improvement of known models, methods
and means of NLP.</p>
      <p>The work aims to develop models, methods, and means of analysis and synthesis of computer
linguistic systems based on new and improved known methods of processing Ukrainian-language
textual content to solve problems of natural language processing. The purpose of the work is to
determine the need to perform such tasks:</p>
      <p>To analyse the specifics of the construction of the CLS by systematizing the processes of their
implementation and functioning, which will provide an opportunity to distinguish a class of
systems whose functional properties allow to perform a quantitative assessment of the
expected effects of the implementation of a typical CLS of processing Ukrainian-language
textual content for solving various tasks of the NLP;
To develop information technology for the construction of CLS for the processing of
Ukrainian-language text, which will make it possible to determine their basic structure,
functional requirements, the sequence of setting and training the system, and general design
principles;
To offer IT processing of information resources as integration, management and support of
Ukrainian-language content based on the improvement of linguistic analysis of text content
for the development of metrics for evaluating the effectiveness of the functioning of the CLS
for solving various tasks of the NLP;
To develop methods of processing Ukrainian-language textual content for solving various
problems of NLP to increase the accuracy of the obtained results;
To develop methods and means of intellectual analysis of textual content to increase the
efficiency of solving various tasks of NLP;
Create software modules for processing Ukrainian-language textual content for solving
various tasks of NLP and conducting experiments;
To test the obtained results by building and implementing applied CLS to process
Ukrainianlanguage textual content.</p>
      <p>The object of research is the processes of analysis and synthesis of computer linguistic systems
for processing Ukrainian-language textual content.</p>
      <p>The research subject is models, methods, and means of processing Ukrainian-language textual
content to solve various problems of NLP.</p>
      <p>The following research methods were used to achieve the goal: the theory of formal grammars
and automata, the theory of sets, the theory of data and knowledge models, the theory of probability
and mathematical statistics, the theory of models, algorithms, and logical-linguistic numbers,
information theory, graph theory, and knowledge presentation methods for modelling the processes
of processing Ukrainian-language textual content and developing machine learning modules; models
and methods of processing and analysing textual content for the implementation of the processes of
solving various problems of NLP; methods of object-oriented and system analysis and design - for
design and development of CLS; the theory of relational databases, methods of artificial intelligence,
object-oriented programming - for the software implementation of the Ukrainian-language textual
content processing system for the solution of various NLP tasks. The practical significance of the
obtained results lies in the fact that they can be used to build applied CLS for processing
Ukrainianlanguage textual content. In particular, the following results are practically valuable:




</p>
      <p>The application of the method of identification of persistent word combinations in the
identification of keywords in Ukrainian-language scientific texts of a technical profile allows
an increase in the accuracy of the search for keywords by 6-9% and highlights thematic terms
from the text for further classification of the publication;
Development of a formal approach to the design of a content monitoring module for
identifying keywords in Ukrainian-language texts based on web data mining, NLP and
linguistic analysis of defined words of text content, which made it possible to develop the
general structure of typical CLS and increase the effectiveness of CLS functioning by 6-9%
depending on the solution of a specific NLP problem;
The application of the method of calculating the degree of verification of the author of the
Ukrainian-language text based on the analysis of the styles of potential authors made it
possible to increase the accuracy of identification by 6-12% and carry out the decomposition
of the method through the study of stylistic coefficients such as the coherence of speech, the
degree of syntactic complexity, linguistic diversity, indices of concentration and exclusivity
of the text;
Development of a content monitoring module to identify a potential author of a text from a
set of possible ones based on a comparison of the results of the analysis of a template author’s
text with the researched one to reduce the volume of the corresponding set to [9;34]% of the
total number of project participants, depending on the subject and the time range of scientific
writing - technical publications, as well as the frequency of publications of this author in this
period on a specific topic;
Experimental testing of the method of identifying the author’s style in Ukrainian-language
texts based on web data mining and linguistic analysis of defined stop words allows the
selection of content potentially similar in style from a set of potential author’s publications.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>Determining the main processes and features of the linguistic analysis of Ukrainian-language texts
will significantly facilitate the stages of processing the text flow of content such as integration,
support and content management (Fig. 1). Adaptation of the processes of intellectual analysis of text
content with the identification of functional requirements for the relevant modules of the CLS will
lead to the possibility of developing a typical structure of similar systems based on the principle of
modularity (adding components depending on the content of the NLP task and the purpose of the
CLS). The application of the specified IT/methods/models in the typical structure of the CLS, adapted
for any process of processing Ukrainian-language textual content, is a necessary prerequisite for the
successful implementation of the CLS project for solving a specific task of the NLP, which requires
the use of an appropriate set of standard libraries, utilities and software with open source, which will
solve specialized functions of the project according to the needs of the end user. The state of the CLS
is determined by the tuple of the main properties at a specific moment in time or the activity of the
corresponding NLP process:  = ( ,  , … ,  ),  = 1,  , where  is the corresponding i-th state
at a specific moment in time  from the set with power |S|=n,  is the corresponding  -th property
of the state from the set with power |P|=m, which determines the behaviour of the CLS as  =
( ,  , … ,  ),  = 1,  , where  is the corresponding parameter of the specific property 
for the state  . For any CLS, the state  is one of the NLP processes, for example, the identification
of keywords and/or stable phrases for the next state  of the system as a rubric of a text array of
data. Accordingly, the properties of the state  are morphological  , lexical  and syntactic  .
Some NLP tasks may have semantic ones, etc. Then, for the property  , a set of parameters is
determined for the corresponding text analysis, depending on the specific task of NLP [40-50].
According to these parameters, the strategy of the CLS operation at the moment of time  is specified
for:</p>
      <p>Web site
Content
support
module
DB profiles</p>
      <p>Client
subsystem</p>
      <p>Content
management</p>
      <p>module
Module of linguistic
analysis of
Ukrainianlanguage textual</p>
      <p>content
A module for solving a
specific NLP problem
of Ukrainian-language
textual content</p>
      <p>Server
subsystem</p>
      <p>Knowledge
base</p>
      <p>Machine
learning
module</p>
      <p>Content
integration
module</p>
      <p>Content</p>
      <p>Data</p>
      <p>Repository
Technological
subsystem</p>
      <p>Internet
parameters of the morphological property 
are N-grams and morphemes: roots 
endings 
, affixes 
; grammatical categories of different parts of speech 
, word length
, word placement in a sentence 
, number of syllables in a word 
, number of word
contents 
, ratio of consonants and vowels</p>
      <p>, etc.;
the parameters of the lexical property 
are the location of the sentence in the test 
, the
location of the word in the sentence 
, the weight of the word 
, the weight of the
sentence 
, the base of the word 
, the inflexion of the word 
, etc.;
parameters of the syntactic property</p>
      <p>are the depth of the word in the dependency tree of
, the location of the word in the sentence 
, the number of contents of the
, the number of words per sentence 
, the number of words 
and sentences
, whether the word is a capital letter 
/ with a hyphen 
/ compound 
, etc.;
parameters of the semantic property 
are the number of word content 
, the depth of
the word in the dependency tree 
, the size of paragraphs 
, the placement of paragraphs</p>
      <p>Depending on the tuple   , the behaviour of the CLS is determined, that is, the implementation
of a set of rules (activation of actions or events) for implementing a specific NLP process depending
on the input text data. Accordingly, the event  is the change of one property to another 

or  :  </p>
      <p>according to the fulfilment of certain conditions  for the input analyzed text  and
the intermediate processed text  : 
=  ( ,  ,  ,  ). Action 
is the process of activation of an
event  by another event  in CLS:  ′ =  ( ∘  ). The more complex the language (morphology,
syntax, etc.), the more difficult it is to process the corresponding texts in natural language. In
addition, for such low-resource languages as Ukrainian, there are no standardized rules and
dictionaries for processing texts in natural language to solve the relevant tasks of NLP. Many
cost or utility of the purpose of the visit,</p>
      <p>is the average  ROI or the average return on investment,
is the percentage (%) of profit from new visitors, 
is the new buyers/customers index at the
first visit.</p>
      <p>The presence of the</p>
      <p>text content support module reduces costs for moderators/analysts
who collect/analyze statistical data on the dynamics of the CLS functioning, the activity of the
permanent target audience as a reaction to website content changes, and the formation of rules for
the analysis of user information portraits and thematic content plots:

=&lt; 
, 
, 
, 
, 
, 
, 
&gt;,
where 
is the advertising quality index; 
is a brand recognition factor; 
are index and % conversion of goals by type of advertising;</p>
      <p>is the conversion rate of goals by type of means.
scientific linguistic schools and IT specialists are working on creating Ukrainian dictionaries, text
corpora and rules for processing Ukrainian texts. However, these are usually linguists and
philologists unfamiliar with the features of specific modern tools, such as programming languages,
ML methods, big data analysis, etc. There is a colossal gap between the research results of philologists
and applied linguists, on the one hand, and IT specialists, on the other, for developing
Ukrainianlanguage tests. Today, quite a few, such as Ukrainian, have been implemented for general access to</p>
      <sec id="sec-2-1">
        <title>NLP tools.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Material and methods</title>
      <sec id="sec-3-1">
        <title>The developed typical structure of</title>
        <p>CLS consists of modules for solving a specific task of NLP
, content support 
, content integration 
, content management 
, linguistic 
and intelligent analysis of textual content flows (IATCF) 
[48]:
Accordingly, the solution module of a specific NLP problem 


=&lt; 
=&lt; 
, 
, 
,</p>
        <p>, 
, 
, 
, 
, 
, 
, 
, 
&gt;.
:
&gt;,
(1)
(2)
(3)
(4)
(5)
=


, 
, 
+ 
+ 
, 
,
( ) is a function for determining % of visits from advertisement w; 
( ) is a
function for determining % conversion of goals for visits from w; 
( ) is a function for
determining the index of advertising quality w; 
is the total number of user queries of
intellectual and informational search (IIS) by keywords; 
is the number of direct visits to the
website;</p>
        <p>is the number of IIS requests with brand name.</p>
        <p>The presence of the</p>
        <p>text content integration module reduces the costs of CLS moderators
and content authors, automating/implementing some of their work/functions such as content
collection from several different reliable sources, its recognition, filtering, saving, formatting,
analysis, annotation, classification, etc.:

=&lt; 
, 
, 
, 
, 
, 
, 
, 
, 
, 
, 
&gt;,</p>
        <p>are % of repeat visits of the user from the previous visit &gt; , within
[ ;  ] when  &lt; and &lt; days, respectively; 
is a brand recognition factor; 
% of new/repeated visitors and interest;</p>
        <p>is the average number of clicks on advertising for 



visits; 
visit; 
is the bounce rate for one web page;</p>
        <p>is the average number of web page views per
is the average length of stay on the web page.
(6)
(8)
(9)
(10)


, 
,</p>
        <p>, 
result; 
use search; 

=&lt; 
, 
, 
, 
&gt;,</p>
        <p>are the average number of page views per visit and for a specific time  ;
is the average number of unique users for a specific time  ; 
is the average number of
visits for a specific time  . The indicator of internal search on the site:
, 

, 
=&lt; 
, 
, 
, 
, 
, 
, 
&gt;,
, 
, 
, 
, 
, 
,
is the number of zero search results; 
are % of users who were on the
page for &gt;  time and viewed &gt;  pages after the search; 
% of buyers among users using search;</p>
        <p>is % of rejections after visiting one page as a search
is % conversion from users using search; 
and</p>
        <p>are % of users who do not use and
is the average number of pages viewed by visitors after a search; 
is the average
time spent on the site for a visit after a search; 
and 
are % of visitors who conduct several
searches during the visit and who left the site after viewing the search results; 
is the average
number of search results; 
is % of visits with search;</p>
        <p>is % of zero search results, in particular,

and 
=


, 
=
, 
=


,
is the number of direct web page visits;</p>
        <p>is the number of one-page visits to a
is the number of visits for analysis; 
is the total number of visits; 
average number of clicks on advertising;</p>
        <p>is the total number of actions on the page; 
are the total number of all and interested users.</p>
        <p>The presence of a text content management module reduces costs for moderators/administrators
who update the website and create rules for caching/searching popular information blocks:

=&lt; 
, 
, 
, 
, 
, 
, 
, 
, 
, 
&gt;,
where 
is an indicator of internal IIS; 
is % edition of the page with an error; 
and
are % of mobile users with a high-speed Internet connection; 
and 
are % of users with
low/medium/high display resolution and with a specific operating system; 
and 
users with a specific browser and with English and/or Ukrainian language support; 
are % of
is an
indicator of the number of users, views and page visits. The 
indicator is the base of the content
management module:
, 
are the number of all viewed pages issued with an error and viewed
is the number of zero search results; 
and 
is visits
pages with a search, respectively; 
without search and with search.</p>
        <p>The presence of a module for intellectual analysis of text streams of content reduces the
time/costs/personnel/resources for the timely and prompt acquisition of relevant, unique, current
content, which leads to an increase in the volume of the target audience of CLS, in particular,
contributes to the growth of the economic effect of the implementation:

=&lt; 
, 
, 
, 
&gt;,
is the average conversion rate; 
is the average length of visit; 
is the average
number of views per visit; 
is % of unique customers/visitors/users; 
is % of new website
customers.</p>
        <p>is % interaction with the site (for example, commenting, voting, registration,
authorization, subscription, etc.);</p>
        <p>is % of users who activate various events (for example, clicking
on an ad, starting a function, pausing, etc.); 
is % of users interacting with different types of
content presentation (viewing the next communication, panning, zooming, etc.); 
is the value of
the measure of usefulness, respectively, of the page/site/CLS/content; 
is the number of unique
page views; 
is profit from e-business; 
is the value of the utility measure of user visits
(based on transactions) and the purpose of user visits (based on the utility of goals).</p>
        <p>Analysis of success/effectiveness/operational search on the site:
=&lt; 
, 
, 
, 
, 
, 
, 
, 
is the value of the usefulness of visiting 
,</p>
        <p>, 
site/page; 
, 
&gt;,</p>
        <p>(12)
is conversion rating in
e-business for CLS corresponding to the NLP task; 
is the value of average utility; 
value of e-business profit for the CLS of the corresponding NLP task; 
is the value of the achieved
conversion of visits to the site/page of the CLS:


(13)
(14)
(15)
According to the tracking of 
events and interaction with the 
site, they analyze:
=  (
, 
) =&lt; 
, 
, 
,</p>
        <p>∙ 
according to the input data from the tuple  .</p>
        <p>The method of determining the effectiveness/quality of the CLS site for solving the NLP problem:
Stage 1. Formulation and identification of usefulness according to the goals of the target audience
Stage 2. Activation of reports of the operation of the CLS from the tuple  of the initial data:
Step 1. Define an unlimited number of goals (4 goals for each target audience profile).</p>
        <p>Step 2. Identify the optimal volume of visits/time of the end user/customer for a successful
conversion.</p>
        <p>=
, 
=
∙ 100%, 
= 
+ 
, 
=</p>
        <p>∙
=
100%,</p>
        <p>To attract new visitors and increase the volume of the permanent target audience, the calculation
of the impact on the income of the IIS on the site is used 
, 
,
:

&gt; 
∙</p>
        <p>), i.e.:
)/100 −</p>
        <p>∙ 100%,

= (
−</p>
        <p>) ∙ 
is the number of visits from the IIS; 
and 
are the utility of visits without and
with IIS.</p>
        <p>The topic of a set of keywords is one of the main indicators of IIS for identifying the specific
content of a page. Optimize investment for sets of keywords that increase conversion values. The
return on investment value ( ROI) must be positive (

=


− 
∙ 100% &gt; 0, 
=</p>
        <p>(</p>
        <p>is the amount of profit. Then they find how much
&gt;q% of funds can be spent on a specific keyword in advertising without the risk of getting  ROI&lt;0.
To calculate the amount of funds for attracting users, use:</p>
        <p>Step 3. Analyse the volume of the contribution of each goal to the total profit.</p>
        <p>Step 4. Combine goals by categories/directions/species.</p>
        <p>Step 5. Form separate sets of transactions as appropriate for the purposes.</p>
        <p>Stage 3. Support various marketing campaigns/customers through</p>
        <p>Stage 4. Support for processing the service content of the site with the 
module.</p>
        <p>Stage 5. Updating the profiles of the target audience according to feedback support through the

module, and analyzing user actions through the 
module.</p>
        <p>Stage 6. Integrating content from different sources through 
according to the achieved goals
and processing it through the</p>
        <p>module.</p>
        <p>Stage 7. Periodic checks are performed to see whether the goals are being achieved and whether
the profit is growing according to the goals. If it subsides, go to stage 1. Otherwise, go to stage 2.</p>
        <p>A classified list of the input stream of content 
with a set of relevant properties demarcates
project participants through their typification and restriction of access rights depending on the
content: regular users, potential visitors, linguists, statistical analysts, administrators, content/rules
moderators, authors of unique content, information resource as content source etc. The typed
structure of the content input stream template with a set of relevant properties helps to define the
main functional requirements for the site/CLS and its typical structure and delineate the
nonfunctional capabilities, classify the sources, calculate the frequencies and the corresponding
restrictions/conditions of integration from the usual source:
 =&lt;  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  , 
&gt;,
(16)
where</p>
        <p>is URL addresses of sources for databases (DB) of CLS filters;  is content as a result
of integration from different</p>
        <p>sources according to a predetermined list of URLs without a
predetermined structure according to relevant thematic requests; 
is thematic requests of
visitors/users of the CLS site in the form of a set of keywords or persistent phrases;  is actual data
of permanent users/profiles and a set of rules of permitted actions within the corresponding type of
user of the CLS;</p>
        <p>is statistical data of actions/ events/ phenomena of the subjects/objects of the
CLS for the solution of the corresponding NLP task and the rules for collecting/saving/analysing
statistics in specific time intervals of the CLS operation; 
is statistical data on the functioning of
the CLS; 
is contents of the DB/DS of content/rules/filters/annotations, etc. of the CLS; 
is
different types of linguistic dictionaries depending on the purpose of the CLS for solving a specific</p>
      </sec>
      <sec id="sec-3-2">
        <title>NLP problem;</title>
        <p>content of CLS; 
is a set of personalized/anonymous reviews and comments of users to the relevant
is a tuple of the results of personalized/anonymous votes of regular/potential
users regarding the content of CLS;</p>
        <p>is statistical personalized individual actions of users of the
CLS; 
is set of external/internal advertising of thematic content; 
is thematic stickers of
information content (exchange rates, announcements, digests, weather, anecdotes, horoscope, etc.);
 is a tuple of options for setting up and changing the CLS/site configurations.</p>
        <p>Filling the tuple of the output data stream  according to the purpose of the CLS for solving a
specific NLP problem directly depends on the content of the input classified stream of content  with
a predetermined set of properties depending on the interaction with the site of the corresponding
types of project participants:
 =&lt;  ,  ,  ,  ,  ,  ,  ,  ,  , 
&gt;,
(17)
where  is text content as an information product or the result of providing an appropriate
information service for solving a specific NLP task on the CLS website;  is a set of meaningfully
generated/cached pages as a result of thematic requests/IIS of users/visitors of the CLS site; 
is
annotations/digests/abstracts on textual thematic content;  is a tuple of statistics of user/visitor
interaction with the site;  is a tuple of the content of the profiles of regular users of the CLS
according to the personalized statistics  for the corresponding generation of an individual portrait
of the user/audience at certain time intervals;  is a tuple of meaningful recommended site content,
personalized for a specific regular user according to the profile/actions/interaction with the CLS in
certain time intervals;  is a set of content topics/headings with the possibility of renewal according
to the results of the latest IIS/requests from regular site users;  is a scheme of interrelationships of
textual thematic content according to the appropriate classification (current, relevant, author's,
outdated, popular, similar, last-viewed, often-viewed, consecutively by a certain most viewed, longer
viewed, most viewed from search engines or internal IIS, viewed by a typical group of users, etc.); 
is the set of content rating results on a predetermined scale within the corresponding ranking
classification;</p>
        <p>is a set of marked evaluation and ranking of user comments as the degree of
permission to publish on the site/page, if necessary, with a prohibition mark for a specific contributor
to write further comments and ranking by the degree of trust of all contributors. The list of the
output flow
of content, its
main
features, the
corresponding
classification, and</p>
        <p>IT
generation/support/analysis contributes to the definition of precise general functional requirements
for implementing the CLS to solve any NLP problem.</p>
        <p>The model of the process of linguistic analysis of the Ukrainian-language text 
is presented

=&lt;  ,  ,  ,  ,  ,  , 
,</p>
        <p>,  ,   ,   ,   ,   ,   ,   ,  , ,  ,  ,  ,  ,  ,  ,  &gt;,
where  is the input data in the CLS from various sources of information  ;  is the original
relevant content from the CLS as a result of the IIS according to the requests of users/visitors; 
is
the process of linguistic analysis of content as a component of the IATCF subsystem 
process of generation/modification of the rules of operation of all modules by the moderator of the
CLS;   is the process of filling an unstructured database with integrated content  ; S  is the
filling module of the structured database based on the processed integrated content  ;   and  
;   is the
are processes of generating results according to the requests of visitors and users;   is a cache
processing process for generating reports on popular requests from CLS users;   is cache
filling/modification process;   is the process of generating statistical results of the functioning of
the CLS/modules and the activities of users  ;  is the operator of generation/modification of the
rules of operation of all modules from the moderator of the CLS;  is the operator of filling an
unstructured database with integrated content  ;  is the operator of filling the structured database
based on the processed, integrated content of  ;  and  are operators for generating results
according to the requests of visitors and users;  is a cache processing operator for generating
reports  on popular requests from users;  is cache filling/modification operator with  data;  is
an operator for generating statistical results of the functioning of the CLS/modules and user
activities:

=&lt;  ,  ,  ,  ,  , , , , , , , , ,  &gt; ,  =  ∘  ∘  ∘  ∘  ∘  ∘  ∘  ∘ ,
(18)
where  is the input text data array;  is a tuple of the original processed text according to the
purpose of the CLS;  is a set of intermediate content, which is processed at the appropriate level in
the CLS;  is auxiliary dictionaries;  is a set of processing rules;  is grapheme analysis operator
(GA);  is morphological analysis operator (MA);  is lexical analysis operator (LA);  is operator of
syntactic analysis (SA);  is semantic analysis operator (SEM);  is ontological analysis operator;  is
reference analysis operator;  is structural analysis operator;  is operator pragmatic analysis (PA).</p>
        <p>The primary process of linguistic analysis of textual content is presented:
 ,  ,  ,  ,  ,  ,  ,  ,   .</p>
        <p>and
sets
of
production/association
rules
 =
3.(III.1) #
4.(II.1) #
5.(II.2) #
6.(II.2) #
7.(II.2) #
8-9
10.(II.4) #
11.(II.3) #
12-18

Sж,од,н,3
</p>
        <p>Sж,oд,н,3

Sж,од,н,3</p>
        <p>
Аж,од,н Sж,од,н,3

Sч,од,р,3
</p>
        <p>Sч,од,р,3
Аж,од,н
Аж,од,н
Аж,од,н
Аж,од,н

Sж,од,н,3

Sж,од,н,3
Sж,од,н
Sж,од,н
Ач,од,р
Ач,од,р
Ач,од,р
Ач,од,р
............................................................................................................................</p>
        <p>IV.6</p>
        <p>IV.2</p>
        <p>IV.6</p>
        <p>IV.1</p>
        <p>IV.7</p>
        <p>IV.4</p>
        <p>IV.6</p>
        <p>IV.3
# весела посмiшка твого
сина наповнює
мене безмежним щастям
..............................................................................................................................</p>
        <p>IX. Basic morphological rules: { +→  +  ;  + и → і; о +  ( ,  ) +  →  +
 ( ,  ) +  ; с' + W → ш + W; в' + W → вл' + W; б' + W → бл' + W; д' + W → дж' + W; т'+ W
→ ч + W; …; д + W → д' + W; с + W → с' + W; …; нн + Ф → н + о}, where  and  are arbitrary
vowels;  is sound designation [j] (йот); Z is any sequence not longer than 3 characters; W =
-е(є)н, -у(ю)ва-, -ова-, -овува-.</p>
        <p> = ( ,  ,  ),   =  ∘  ∘  , (23)
where  is grammar induction implementation operator;  is the operator of
identification/elimination of boundary ambiguity or sentence violation;  is operator of syntactic
parsing of phrases/sentences for building a SA tree. Rules for formulating Ukrainian phrases:</p>
        <p>I. Choice of structure: { → # , ,н,   ,тепер, #}, where  is verb group,  is noun group, 
is gender,  is singular/од, or plural/мн;  is the case,  is the person.</p>
        <p>II. Noun group: { , , , →  , , ,   , ,р, ;  , , , →  , ,   , , , ;   , , ,   →
  за,й,м,  ,</p>
        <p>III. Verb
  , , , 
group:
  ;  , , , →  , , }.</p>
        <p>{ ,тепер, →  ,тепер,   , ,зн,  
, ,ор, ;
 ,тепер,   , ,ор,  
, ,зн, ;</p>
        <p>,тепер, →  ,тепер,   , ,зн,  ;
 ,тепер,   , ,ор, }.</p>
        <p>IV. Substitution of words: { ч, , → син , , . ..;  ж, , → посмішкау, , . ..;  сер,у, → щастя , , . . . ;
 хз,аойдм, , → я ;  хз,аойдм, , → ти ;  у,тепер, → наповнити ,тепер, , . ..;
веселийх, , , безмежнийх, , , м йх, , , тв йх, , , . . . }.</p>
        <p>Stage 5. Semantic analysis  of the Ukrainian-language text   consists of
 ,тепер, →
 ,тепер, →
Ах,у, →
  = ( ,  ,  ),   =  ∘  , (24)
where  is the identification operator of lexical semantics with the generation of a collection of
values of each lexeme of the text;  is the relational semantics identification operator of the
interdependencies of the content of the lexemes of the text.</p>
        <p>Stage 6. Reference analysis  identification of interphase units  .</p>
        <p> = ( ,  ,  ). (25)</p>
        <p>Reference analysis is often part of SEM. For Ukrainian texts, when analysing large corpora of
texts, it is best to carry out as a separate stage (for example, for the analysis of the correspondence
of a social group/community in social networks or other dialogues to identify logical, meaningful
connections between the posts of different participants due to the subjectivity of everyone's speech.</p>
        <p>Stage 7. Structural analysis  of the Ukrainian-language text   based on the degree of coincidence
of lexical, terminological units of unity of text fragments. It is often part of SEM for short
texts/messages or not used at all. For large corpora of texts as an additional stage of elimination of
marked inaccuracy in SEM.</p>
        <p> = ( ,  ,  ) or   = ( ,  ,  ).</p>
        <p>Stage 8. Ontological analysis of  text content   on the basis or part of the results of SEM and
reference/structural analyses if necessary:</p>
        <p> = ( ,  ,  ),   = ( ,  ,  ) or   = ( ,  ,  ).</p>
        <p>Stage 9. Pragmatic analysis of  text content   is used to determine the text's structure by
considering the context of sentences when forming paragraphs, sections, and dialogues. PA is an
essential addition to SEM, reference, and structural analyses if it does not contribute to eliminating
marked inaccuracy.</p>
        <p>= ( ,  ,  ,  , [ ,  ,  ], ),  =  ∘  ,
where  is a semantics identification operator outside individual sentences/phrases;  is the
operator of text processing through higher-level NLP applications, for example, to simulate
intelligent behaviour and an apparent understanding of natural language.</p>
        <p>A general scheme/model of the pipeline of the CLS operation has been developed based on
improved methods of processing information resources such as integration, maintenance and
content management, as well as the development of improved methods of intellectual and linguistic
analysis of text flow using machine learning technology (Fig. 3) [52-58]. Based on feedback from the
user and output data of the ML model, the target audience interacts with the CLS, which contributes
to the adaptation of the selected learning model. Five stages of relevant processes determine the basic
architectural principles of building a typical CLS. The methods of monitoring, developing and
managing content are interaction, formatting/filtering, NLP, ML and data accumulation in DS.
Content and support processes feature analysis, deployment, prediction, interpretation, and
content/result presentation. At the interaction stage, a set of rules for integrating content from
multiple reliable sources at certain intervals is developed. Also, in parallel, a set of rules for checking
the data entered by the user of the CLS was created as a preliminary stage for the formatting/filtering
stage according to a collection of rules and content from the DS set in advance by the moderator.
The next stage of NLP is an intermediate stage for ML and data accumulation. The ML stage is
implemented through SQL queries and modules. The support process is more accessible to implement
than the management stage, especially when analysing the results of the NLP, in which additional
lexical resources and artefacts (dictionaries, translators, regular expressions, etc.) are created, which
directly depend on the effectiveness of the CLS functioning (Fig. 4) [52-58].
(26)
(27)
(28)
Input
content
Relevant
content</p>
        <p>User
requests</p>
        <p>Interaction
Integration
Presentation</p>
        <p>Feedback
CLS website</p>
        <p>Processes of monitoring, development and management of content</p>
        <p>Formatting</p>
        <p>filtering
Transformation
Interpretation</p>
        <p>API</p>
        <p>NLP
Normalization
Prognostication</p>
        <p>Assessment</p>
        <p>Machine
learning
Classification
Deployment</p>
        <p>Modeling</p>
        <p>Accumulation of</p>
        <p>content/
analysis of
features</p>
        <p>Data storage</p>
        <p>Computer linguistic system</p>
        <p>Content analysis and support processes</p>
        <p>The transition process from the raw text to the expanded ML model consists of additional content
transformations. First, the input text content is transformed into the input corpus as a collection of
texts, accumulated and stored in the DS. The incoming content is further grouped, filtered, formatted,</p>
        <p>The process of generating an optimal machine learning model
linguistically processed, marked, normalized and converted into vectors for further processing. In
the final transformation of the model (Fig. 5) [52-60], they train on the vector corpus to create a
generalized presentation of the original content for further use in solving a specific NLP problem.</p>
        <p>NLP methods have been improved based on the developed 82 regular expressions (RGs) of pattern
matching in GA and more than 2000 RGs of morphological analysis of Ukrainian-language texts.
RV's primary admissible operations are the union and disjunction of symbols/chains/expressions,
number and precedence operators, and anchors of the presence/absence of symbols in regular
expressions. The main stages of tokenization and normalization of the Ukrainian text by cascades of
simple substitutions of RG and finite automata are determined. Algorithms for word segmentation
and normalization, sentence segmentation, and Porter's modified stemming are implemented and
described as an effective way of identifying lem affixes for the possibility of marking the analysed
word. Porter's modified stemming algorithm is based on searching/checking the obtained
intermediate results with the tree of inflexions (so as not to go through all possible inflexions) and
with the content of thematic dictionaries of bases with a set of PG-rules for identification of features
(classification by parts of speech).</p>
        <p>Step 1. Identify the next lexeme as the word  ( =  ).</p>
        <p>Step 2. Check with the stop word dictionary whether  or  is a service word. If yes, then
 =  + 1 and go to step 1. Otherwise, go to step 3.</p>
        <p>Stage 3. Go to the end of the word  . Recognize the inflection  in  from all possible ones
(the longest one is chosen, for example, in  =текстова we choose the ending  =ова, not  а)
Generation of
the ML model</p>
        <p>Forming
features set
Choice of ML</p>
        <p>model
Adjustment of
parameters</p>
        <p>Model
control</p>
        <p>Lexical
resources
from the RG of the word type 
, 
, or 
and in the presence of deletion of the
inflexion  .</p>
        <p>Stage 4. Saving the inflection  in the word tag  .</p>
      </sec>
      <sec id="sec-3-3">
        <title>Stage 5. Label  as type</title>
        <p>, 
or 
, respectively.</p>
        <p>Stage 6. Finding the deleted inflection  in the tree of inflexions 
(the longest one is
chosen). Checking the contents of the subtree 
with the existing word ending  ( = 
+
 ). If 
ends in  and has a counterpart in 
, then we store it in  =  and delete in  .</p>
        <p>Stage 7. We check the obtained base 
of the initial word 
with the content of the dictionary
of bases 
of words of the Ukrainian language. If there is no respondent, we store &lt;  , 
&gt; in
the additional temporary intermediate dictionary 
for the moderator and proceed to stage
,</p>
      </sec>
      <sec id="sec-3-4">
        <title>1. Otherwise, proceed to stage 4.</title>
        <p>Stage 8. Analysis of inflexion and the presence/absence of alternation of letters in the
base/inflexions of the words&lt;  , 
&gt; and the analogue of the base of the word in 
according
to the corresponding РG-rule of MA to identify additional features of the analyzed word  .</p>
        <p>Stage 9. Adding the identified linguistic features of the recognized part of speech to the tag of the
word 
of the type 
, 
or 
, respectively. Saving the results in the
corresponding dictionary</p>
        <p>of the analysed text.</p>
        <p>Unlike the classic Porter's algorithm, the modified one is adapted specifically for the Ukrainian
language and gives an accurate result in 85-93% of cases, depending on the quality, style, genre of
the text and, accordingly, the content of the dictionaries of CLS. In total, about 1,300 rules for
processing suffixes and endings, considering the alternation of letters, adjectives - 99 RG-rules, and
verbs - more than 800 RG-rules have been implemented for MA Ukrainian-language nouns. The
algorithm for the minimum editorial distance of lines of Ukrainian texts is described as the minimum
number of operations required to transform one into another. Also, an algorithm for calculating the
maximum likelihood metric for the 2-gram and 3-gram models based on the analysis of word bases
was developed to identify stable word combinations as keywords. To forecast the conditional
probability of the following base of the word, we use the Markov assumption (the probability of the
word depends on the previous one).</p>
        <p>Moreover, suppose the keywords are a set of nouns or an adjective with a noun. In that case,
other words, such as verbs, participles, etc., will be considered additional separators as other
punctuation marks that demarcate persistent phrases as potential keywords. The order of bases is
not crucial for the Ukrainian language.</p>
        <p>Stage 1. Process the input text and break it into separate phrases (sentences)  
, marking
each start-end with the corresponding &lt;p&gt; &lt;/p&gt; tag. Eliminate all non-alphabetic characters.
Convert uppercase letters to lowercase. Remove official words if necessary (for certain NLP tasks).</p>
        <p>Stage 2. Apply Porter's stemming to obtain the sequence of word stems  
of word
… 
… 
stems  taking into account word normalization, respectively.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Stage 3. Receive input queries</title>
        <p>… 
as a sequence of words of the searched data. Find 
for each word  
…</p>
        <p>basis by stemming.</p>
      </sec>
      <sec id="sec-3-6">
        <title>For example, for the search phrase  :</title>
        <p>Translation - Method and tools for information systems processing in electronic content commerce
systems</p>
        <p>ресурс
метод
та
засіб
опрац
інформ
систем
електрон</p>
        <p>контент
комерц
, 
, 
, 
, 
where 
=&lt; 
, 
, 
, 
, 
, 
, 
&gt;,
is a tuple of simple sentence generation properties.</p>
        <p>where  is a tuple of lexical signs of phrase generation;  is a tuple of syntactic signs
of phrase generation;  is a tuple of named properties;  is a tuple of adjectival
properties;  is a tuple of properties of numerals;  is a tuple of pronominal properties;
 is a tuple of verb properties;  is a tuple of adverbial properties;  is a tuple of
consecutive properties and  is a tuple of subordinate properties;  is a tuple of ordinal
where  is a tuple of the properties of a separating connection,  is a tuple of the
properties of a connecting connection, and  is a tuple of the properties of an opposing
connection.
, 
, 
, 
&gt;,
&gt;,
=&lt;</p>
        <p>, 
=&lt; 
&gt;,
&gt;,
where 
is a tuple of matching properties; 
is a tuple of control properties;
is a tuple of adjacency properties. A tuple of sentence generation concepts: 
=&lt;
 ,  ,  ,  &gt;, where sentence generation properties are grouped in 
are a tuple of sentence generation properties;  is a tuple of clause identification properties;
where  is a tuple of narrative sentence generation properties;  is a tuple of
properties for generating interrogative sentences;  is a tuple of prompt sentence generation
properties;  is a tuple of properties for generating emotionally neutral sentences; 
is a tuple of properties for generating emotional sentences; a tuple of concepts for the formation of
 simple and  complex sentences;  is a tuple of properties identifying the
main members of the sentence;  is a tuple of the properties of the identification of the
secondary members of the sentence;  =&lt;  ,  &gt;;  is a tuple of
properties for generating affirmative sentences;  is a tuple of negative sentence generation
properties. To generate a simple sentence 
features are analyzed:
&gt;,
&gt;,
, 
,</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments, results and discussion</title>
      <p>I will analyse the results of the experimental approbation of the developed methods and means of
linguistic, intellectual analysis of texts in the Ukrainian language based on the development of
methods for identifying keywords, determining persistent word combinations, thematic
classification of the text and detecting duplication of text. Let us consider the peculiarities of the
process of syntactic analysis of Ukrainian-language textual content aimed at identifying significant
keywords of input texts. Having determined the role and formal features of the syntactic analyser in
the process of identifying keywords of the content topic, the procedures of the proposed method
were decomposed into two stages (Table 1), where A (total keywords identified with a given word
weight), B (generated significant words without pronoun and verbs), C (coincidence of words with
the author's list), D (accuracy of the coincidence of identified keywords with the author's list), E
(additionally defined keywords, but not determined by the author of the publication). In stage 1, the
research for step 1 (analysis of full articles) and step 2 (articles without metadata such as abstract,
author keywords and list of references) was carried out without the application of ML, and in stage
2 - with ML. The method of article analysis without metadata achieves the best results according to
the density criterion. The author of the article often defines a more significant number of words ( )
and a smaller number of keywords ( ) than are present in the text of the scientific and technical
publication (Fig. 6). Unlike known parsers, the proposed method provides self-improvement and
selflearning of the keyword definition module due to the identification mechanism of significant
statistical parameters within the limits defined by the moderator. A system has been developed on
the Victana website, which allows users to choose from a list of languages of the analysed text
(http://victana.lviv.ua/index.php/kliuchovi-slova). The value of  differs from the value of  by
0.69 (by number, but not by content);  from  by 1.74;  from  by 2.66;  from  by 3.58.
The value of  differs from the value of  by 4.36; respectively,  from  by 3.31;  from  by
2.39;  from  by 1.47. Adaptively changing the parameters/rules of the module almost doubles
the collection of identified keywords (for example, the value of  is greater than  by 1.144654; 
by 1.750524;  by 1.557652;  by 1.36478). The total increase in value obtained depending on the
moderation of dictionaries is, respectively, for  is 14.46541;  is 36.47799;  is 55.7652;  is
75.05241. When comparing  is greater than  ÷  and we have a chain of such values as 1.7985;
1.5084; 1.3217; and 1.176.</p>
      <p>For different stages and steps of the experiment of processing the primary text, the average
coincidence of the lists of discovered keywords with the author's keywords varies in the range of
52.6-68.5%. The accuracy of matching keywords with the author's keywords ranges from 43.6 to
62.9%. The average match of meaningful keywords compared to all found by the system ranges from
38.9-75.8%, depending on the stages of analysis of article texts. The accuracy of matching keywords
compared to all found by the system varies between 34.3-71.9%, depending on the stages of analysis
of article texts. For  , the module most often identified the number of keywords {5, 7, 3} (10),
although the distribution of found keywords was within [1;18] words (except 17).</p>
      <p>For  , the module most often identified the number of keywords also {5, 7, 3}, although the
distribution of found keywords is within [1;18] (except 17), the number of identified words increased,
Total words
Meaningful words
Coincidence with author's
Match accuracy
Additional words
Total words
Meaningful words
Coincidence with author's
Match accuracy
Additional words</p>
      <p>Author's keywords
Number of words
Defined by the system
Meaningful words
Coincidence with author's
Match accuracy
Additional words
Author's keywords
Number of words
Defined by the system
Meaningful words
Coincidence with author's
Match accuracy
Additional words
0
0
10
b) 0</p>
      <p>10
d) 0
b)
2
0
with 
a)
c) 0
a)
2
0
Total words
Meaningful words
Coincidence of words
Match accuracy</p>
      <p>Additional words
Total words
Meaningful words
Coincidence with author's
Match accuracy
Additional words
Author's keywords
Number of words
Defined by the system
Meaningful words
Coincidence with author's
Match accuracy
Additional words
Author's keywords
Number of words
Defined by the system
Meaningful words
Coincidence with author's
Match accuracy</p>
      <p>Additional words
d)
Weight = 1</p>
      <p>Weight =2</p>
      <p>Weight =3</p>
      <p>Weight =4</p>
      <p>Weight=1 Weight=2 Weight=3 Weight=4 Weight=5</p>
      <p>Analysis was performed for filtered texts without metadata and unfiltered texts. The average
values obtained for filtered texts 
= 0.28 and unfiltered 
= 0.19 shows that filtering
scientific articles improves keyword density by 1.48 times or 47.83% (Fig. 9a).
and the highest reliability index was achieved. For  , the module most often identified the number
of keywords {7, 6, 5, 10, 8}, although the distribution of found keywords was within [2;14] (the range
narrowed significantly). For  , the module most often identified the number of keywords {8, 5, 7,
10}, the distribution of identified keywords within [3;16] (accuracy improved). The accuracy of the
definition of keywords increases during the moderation of dictionaries and the ML module. The
difference between the number of keywords defined by the author and identified by the module at
is 44.39919% (difference in %). Accuracy improves with 
is 33.70672%, significantly improving
is 24.33809%, and with 
The obtained values for the texts 
= 0.34 and 
= 0.25, taking into account the
refinement of the thematic dictionary through ML and the replenishment of blocked words, shows
that filtering with simultaneous moderation of the thematic dictionary improves keyword density
by 1.35 times or by 35.44% (Fig. 9b). A comparison of the values in the original author's text 
=</p>
      <p>= 0.25 without/with the refinement of the thematic dictionary, respectively,
a)
1
0
b)
1
0
demonstrates the effectiveness of the moderation of the thematic dictionary in the initial text - the
density of keywords increases 1.34 times or by 34.33% (Fig. 10a). Comparison of the values in the
filtered author's text  = 0.28 and  = 0.34 without/with the refinement of the thematic
dictionary, respectively, demonstrates the effectiveness of the moderation of the thematic dictionary
in the filtered text as the density of keywords increases 1.23 times or by 23.14% (Fig. 10b).</p>
      <p>General text with a detailed dictionary
General text without a specified dictionary</p>
      <p>Filtered text with refined vocabulary
Filtered text without dictionary refinement</p>
      <p>So, the experimental study confirmed the method's reliability - for different stages of processing
the primary text, the average coincidence of the lists of identified keywords with the author's
keywords varies in the range of 52.6-68.5% (by 9%). The accuracy of matching keywords with the
author's keywords ranges from 43.6 to 62.9%. The average match of meaningful keywords compared
to all found by the system ranges from 38.9-75.8%, depending on the stages of analysis of article texts.
The accuracy of matching keywords compared to all found by the system varies between 34.3-71.9%,
depending on the stages of analysis of article texts. A method of determining stable word
combinations when identifying textual content keywords in reference passages of the author's text
has been developed. The process consists of the use of Zipf's law in the formation of stable word
combinations as key, taking into account the following rules of preliminary linguistic processing of
the text: removal of all stop words; form bigrams only within the limits of punctuation marks and
words that are not verbs or pronouns (the latter are considered punctuation marks); determine verbs
by inflexions; form bigrams based on their bases without taking into account their inflexions;
definition of adjectives by inflexions and to believe that adjectives should only be in the first place
in the bigram from Ukrainian-language texts. A module has been developed to identify persistent
phrases as keywords in textual content. An approach to developing linguistic content analysis
software for the determination of stable word combinations in identifying keywords of
Ukrainianlanguage and English-language textual content is proposed. The peculiarity of the approach is
adapting the linguistic, statistical analysis of lexical units to the peculiarities of the constructions of
Ukrainian and English words/texts. The results of the experimental approbation of the proposed
method of content analysis of English- and Ukrainian-language texts to determine stable word
combinations in identifying keywords of technical texts were studied.</p>
      <p>A method of identifying the style of the author of the text based on the analysis of linguistic
speech coefficients in the standard has been developed. The technique consists of a comparative
study of the author's attribution in the author's statistically processed work (standard) with an
arbitrarily analysed passage. The method evaluates the probability of the text of the article belonging
to the author of the benchmark with the analysis of the relevant coefficients of lexical speech as the
concentration of the text  , the coherence of the speech  , the uniqueness of the text  , the
syntactic complexity of the speech  and the linguistic diversity of the speech  . The degree of
speech connectivity  does not decrease significantly. In 2001, it changed within [0.5; 1.2], and in
2021 – within [0.4; 0.9] (Fig. 11). Moreover, the method works under the condition that the author's
standard has already been researched - the task of NLP is to form the author's frequency dictionary,
including service/stop words.</p>
      <p>An algorithm for determining stop words of text content based on linguistic analysis of text
content has been developed. For the individual style of the author's text, markers are service/stop
words (for example, particles, conjunctions, prepositions, parasite words, slang, slang, etc.) unrelated
2
1
0
Kl</p>
      <p>Ks</p>
      <p>Kz
to the article's topic. The absolute and relative frequencies of stopwords were analysed and compared
with the reference values for each excerpt. Therefore, applying the method of reference words gives
the following results: finding what most likely belongs to the standard among the studied passages.
Other results also confirm the effectiveness of the keyword method in author attribution of texts.
The proposed assumption about the insignificance of the influence of the share as a parameter of the
process on the results led to a decrease in the correlation coefficients but placed the probability of
belonging to the standard for passages in the correct order (Table 2). More likely, Excerpt 4 belongs
to the author of the template (although there is no significant difference between results 4 and 2, if
they are written in the same period, they do not belong to the author of the template; if in different
periods with the template, the probability of belonging to this author increases).</p>
      <p>1 8 51 22 29 36 43 50 57 64 71 78 85 92 99 016 113 120 127 134 141 148 155 162 169 176 183 190 197 204 211 218</p>
      <p>An algorithm for the linguistic analysis of Ukrainian-language texts and a syntactic analyser of
text content has been developed. The features of the algorithm are the adaptation of morphological
and syntactic analysis of lexical units to the peculiarities of constructions of Ukrainian words/texts.
Algorithms are tested to identify significant stopwords in Ukrainian-language text based on regular
expressions. When parsing words belonging to a part of speech, declension within this part of speech
was taken into account. For this purpose, word inflexions were analysed for classification, selection
of the basis and formation of the corresponding alphabetic-frequency dictionaries. The dictionaries
contents were subsequently taken into account in the next steps of determining the text's authorship
by calculating the parameters and coefficients of the author's speech. Software implementation for
solving some NLP problems, as research of:







keywords (https://victana.lviv.ua/kliuchovi-slova);
stable phrases (https://victana.lviv.ua/nlp/stiiki-slovospoluchennia);
classification of textual content (https://victana.lviv.ua/kliuchovi-slova);
quantitative evaluations of speech (https://victana.lviv.ua/nlp/linhvometriia);
the author's style based on calculations of stylometry coefficients and their comparison with
the corresponding coefficients in the standard text (https://victana.lviv.ua/nlp/stylemetriia);
differences in text signs (https://victana.lviv.ua/nlp/hlotokhronolohiia);
features of the style of texts based on N-grams (https://victana.lviv.ua/nlp/n-grams).</p>
      <p>The results of the experimental approbation of the proposed content monitoring method for
determining the author in Ukrainian-language scientific texts of a technical profile were studied. A
comparison of the results of more than 300 one-person works of a technical direction by 100 different
authors for 2001–2021 was carried out to determine whether and how the coefficients of text
diversity of these authors change in different periods. A method of identifying the potential
(probable) author of a Ukrainian-language text based on the analysis of the author's linguistic speech
coefficients in a reference passage of the author's text has been developed. Decomposition of the
method of determining the author was carried out based on the analysis of such speech coefficients
as speech coherence, degree of syntactic complexity, linguistic diversity, indices of concentration
and exclusivity of the text. In parallel, such parameters of the author's style as the number of words
in a specific text, the total number of words in this text, the number of sentences, the number of
prepositions, the number of conjunctions, the number of words with a frequency of 1 and the number
of words with a frequency of 10 and more, as well as keywords and 3 - grams. For example, 3-grams
of 3 articles were analysed [61-63] (Ukrainian versions). For the most frequently used letters, the
frequency of appearance of 3-grams with such initial letters will have an almost identical distribution
(peak values in Fig. 12a), but not for other letters. Therefore, it is expedient to study only 3 grams for
initial letters that occur less often in the texts of a specific language to determine the degree of
belonging of the text to the corresponding author (for example, Fig. 12b). According to these graphs.
It appears that Articles (1,2) are more likely to be written by the same author, although the same
author could also write Articles (1,3) (but this is not true). Different authors write articles (2,3).
Applying linguistic, statistical analysis of 3-grams to a set of articles makes it possible to form a
subset of publications similar in terms of linguistic characteristics. Imposing additional conditions in
the form of linguistic, statistical analyses (a set of keywords, stable word combinations (Table 3),
stylometric, ligvometric, etc.) will significantly reduce the subset, clarifying the list of more likely
authors' works. Thus, the analysis of the content and frequency of appearance of only official words
separates Articles (1,3) into different subsets, leaving Articles (1,2) in one. 78.4814% of 3-grams were
analysed for Article 1, 72.6332% for Article 2, and 84.1271% for Article 3. The difference in the use of
the corresponding 3-grams between Articles (1,2) is R12=56.5254%, between Articles (2,3) –
R23=69.4271%, between Articles (1,3) – R13=62.9839%. Accordingly, Articles (1,2) are more similar by
[6-12]% (R23&gt;R12 by 12.9017%, R23 &gt; R13 by 6.4432%, R13&gt; R12 by 6.4585%, i.e. R23&gt;R13&gt;R12) than
Articles (1,3) and (2,3). The smaller the Rij, the greater the degree to which the same author writes
the articles. Then, in case Articles (1,2) are more likely to be written by one author/team than Articles
(2,3) and (1,3), respectively.
інформаційни
й технологія</p>
      <p>FREG</p>
      <p>AF
4
4
3
2
1
1
1
1
1
1
50,00%
50,00%
c) 0,00%
e)
f)
c)</p>
      <p>When identifying the author of a text, it is assumed that the text reflects the author's style of
writing, which makes it possible to distinguish him from others. To compare texts with each other,
it is necessary to compare some numerical characteristics of the text, which would be approximate
for the texts of the same author and differ significantly for the works of different authors. Such a
characteristic can be the density of the distribution of letter combinations of three consecutive
symbols (3-grams). During the experimental testing based on the developed four different algorithms
for calculating the degree of verification of the author of the Ukrainian-language text from a set of
possible values, values were obtained that confirm that the style of the authors numbered x and y is
quite close (more than 90%) to the style of collective works 1–4, respectively. Also, the number of
authors (from 42.02% to 34.04% of the total 100 participants in the project from more than 300 articles)
was significantly reduced, with similarity in speech style. Figure 13 presents graphs of the results
obtained when applying algorithms to analyse the method developed to determine the author's style.</p>
      <p>Further, an analysis of stop words and keywords of the authors' works was used to determine the
author's style, as 34.04% got to those. Each individual has their vocabulary for conveying thought,
including so-called "parasitic" (that is, therefore, although, etc.) and service words (and, and, and,
but, although, etc.). Figure 14 presents an example of the analysis of the author's style in the second
stage by analysing the frequency of service appearance and keywords, considering various filters.
Therefore, a method of determining the style of the author of thematic Ukrainian-language textual
content was developed based on the analysis of keywords, stable word combinations, N-grams,
lingumetry and stylometry, which made it possible to determine the stylistic contribution of each of
the authors and increase the accuracy of attribution of a scientific and technical publication by 6%.
A method for calculating the degree of verification of the author of a Ukrainian-language text from
a set of possible ones based on a comparative analysis of the styles of potential authors has also been
developed, which made it possible to increase the accuracy of classification by style similarity by 7%.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The work solves an important scientific and applied problem of analysis and synthesis of CLS for
solving various problems of processing Ukrainian-language textual content based on the
development of new and improvement of known models, methods and tools of NLP:</p>
      <p>An analysis of the current state and prospects of IT development of natural language
processing was carried out, which made it possible to define the problem and research tasks,
as well as to form general research directions in the absence of non-commercial open-source
software as CLS for processing Ukrainian-language textual content and a standardized design
approach.</p>
      <p>The relevance of solving the problem of analysis and synthesis of CLS based on the
development of the general structure of the system for processing Ukrainian-language
textual content is substantiated due to the interaction of the main processes/components of
IS and methods of linguistic processing of textual content adapted to the Ukrainian language
based on grapheme, morphological, lexical, syntactic, semantic, structural, ontological and
pragmatic analysis allowed to improve the IT of intellectual analysis of text flow for solving
a specific task of NLP. It ensured the adaptation of NLP processes for the analysis of
Ukrainian-language textual content and, based on them, increased the accuracy of the
obtained results by 6-48%, depending on the specific task of NLP. For example, for the NLP
task of determining the Ukrainian-language text keywords, the density of keywords
increases in the range [1.23; 1.48] times or by [23.14; 47.83]% depending on filling the
thematic dictionary quality/accuracy through machine learning.</p>
      <p>The methods of processing information resources, such as integration, management and
support of Ukrainian-language content, were improved, which made it possible to adapt the
process of intellectual analysis of the text flow and develop metrics of the effectiveness of the
CLS functioning for the solution of various tasks of the NLP. The developed methods and
tools make it possible to build a CLS for processing Ukrainian-language text content
according to the needs of the permanent/potential target audience based on the analysis of
the history of actions of website users.</p>
      <p>The NLP methods based on regular expressions of pattern matching were improved, which
made it possible to adapt the methods of tokenization and text normalization by cascades of
simple substitutions of regular expressions and finite state machines.</p>
      <p>The MA method of the Ukrainian-language text based on word segmentation and
normalization, sentence segmentation and modified Porter's stemming algorithm was</p>
      <p>improved as an effective tool of identifying lem affixes for the possibility of marking the
analysed word, which made it possible to increase the keyword searches accuracy by 9%.
The IT of the intellectual analysis of the text flow was improved based on the processing of
information resources, which made it possible to adapt the general structure of modules for
integration, management and support of content to solve various tasks of the NLP and
increase the efficiency of the operation of the CLS by 6-9%. It became possible thanks to the
combination of methods of linguistic analysis adapted to the Ukrainian language, improved
IT processing of information resources, ML, and a set of metrics for evaluating the
effectiveness of the CLS's functioning. The main principle of building such CLS is modularity,
which facilitates their construction by requiring the availability of appropriate processes for
solving a specific NLP problem.</p>
      <p>A method of determining the author in Ukrainian-language texts has been developed based
on the analysis of the coefficients of the author’s lexical speech in the reference passage of
the author’s text, which is based on the study of a collection of keywords, persistent phrases,
indicators of linguometry, stylometry, as well as the results of the analysis of N-grams based
on comparisons of usage differences 2-gram and 3-gram for publications similar in style in
the range of [6;7]%, and for exactly not similar – &gt;12%), which made it possible to determine
a set of potential authors of publications from more than one author (up to [9; 34]% of the
total number of project participants) and develop a method for identifying the author's style.
A method of determining stable word combinations was developed based on the
identification of keywords of the Ukrainian-language text and the analysis of the linguistic
speech coefficients of the author of the text in reference excerpts of the content, which made
it possible to improve the accuracy of the method of determining the style of the author of
the text by 9% based on statistical linguistics.</p>
      <p>Relevant materials confirm the reliability of scientific and practical results on the
implementation of dissertation studies by comparing the obtained practical results on
different samples of reliable input data. CLS was developed using CMS Joomla on the
information resource http://victana.lviv.ua! (for designing the e-framework of articles), PHP
(for implementing text content processing methods), HTML (for implementing page
markup), CSS (for describing page styles), and MySQL (for storing data and dictionaries). The
experimental study confirmed the reliability of the method of identifying keywords - for
different algorithms for processing the primary text, the average match between the lists of
identified keywords and the author's keywords varies in the 52.6-68.5% range. The accuracy
of matching keywords with the author's keywords ranges from 43.6 to 62.9%. The average
match of meaningful keywords compared to all found by the system ranges from 38.9-75.8%,
depending on the stages of analysis of article texts. The accuracy of matching keywords
compared to all found by the system varies between 34.3-71.9%, depending on the stages of
analysis of article texts.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The research was carried out with the grant support of the National Research Fund of Ukraine,
"Information system development for automatic detection of misinformation sources and inauthentic
behaviour of chat users ", project registration number 187/0012 from 1/08/2024 (2023.04/0012). Also,
we would like to thank the reviewers for their precise and concise recommendations that improved
the presentation of the results obtained.</p>
    </sec>
    <sec id="sec-7">
      <title>References</title>
      <p>[1] I. Lauriola, A. Lavelli, F. Aiolli, An introduction to deep learning in natural language processing:</p>
      <p>Models, techniques, and tools, Neurocomputing 470 (2022) 443-456.
[2] Y. Kang, Z. Cai, C. W. Tan, Q. Huang, H. Liu, Natural language processing (NLP) in management
research: A literature review, Journal of Management Analytics 7(2) (2020) 139-172.
[3] L. Hickman, S. Thapa, L. Tay, M. Cao, P. Srinivasan, Text preprocessing for text mining in
organizational research: Review and recommendations,Organizational Research Methods 25(1)
(2022) 114-146.
[4] D. Hu, An introductory survey on attention mechanisms in NLP problems, in: Proceedings of
the Intelligent Systems Conference on Intelligent Systems and Applications 2 (2020) 432-448.
[5] M. Gardner, W. Merrill, J. Dodge, M. E. Peters, A. Ross, S. Singh, N. A. Smith, Competency
problems: On finding and removing artifacts in language data, arXiv preprint arXiv:2104.08646,
2021.
[6] L. Wu, et. al., Graph neural networks for natural language processing: A survey, Foundations
and Trends in Machine Learning 16(2) (2023) 119-328.
[7] E. Fedorov, O. Nechyporenko, Linguistic Constructions Translation Method Based on Neural</p>
      <p>Networks, CEUR Workshop Proceedings 3396 (2023) 295-306.
[8] M.-A. Lefer, N. Grabar, Super-creative and over bureaucratic: A cross-genre corpus based study
on the use and translation of evaluative prefixation in ted talks and EU parliamentary debates,
Across Languages and Cultures 16(2) (2015) 187–208.
[9] M. Konyk, V. Vysotska, S. Goloshchuk, R. Holoshchuk, S. Chyrun, I. Budz, Technology of
Ukrainian-English Machine Translation Based on Recursive Neural Network as LSTM, CEUR
Workshop Proceedings 3387 (2023) 357-370.
[10] N. Shakhovska, I. Shvorob, The method for detecting plagiarism in a collection of documents,
in: Proceedings of the International Conference on Computer Sciences and Information
Technologies, CSIT, 2015, pp. 142-145.
[11] O. Karnalim, G. Kurniawati, Programming Style on Source Code Plagiarism and Collusion</p>
      <p>Detection, International Journal of Computing 19(1) (2020). 27-38.
[12] V. Vysotska, Y. Burov, V. Lytvyn, A. Demchuk, Defining Author's Style for Plagiarism Detection
in Academic Environment, in: Proceedings of the International Conference on Data Stream
Mining and Processing, DSMP, 2018, pp. 128-133.
[13] O. Barkovska, V. Kholiev, A. Havrashenko, D. Mohylevskyi, A. Kovalenko, A Conceptual Text
Classification Model Based on Two-Factor Selection of Significant Words, CEUR Workshop
Proceedings 3396 (2023) 244-255.
[14] A. Berko, Y. Matseliukh, Y. Ivaniv, L. Chyrun, V. Schuchmann, The text classification based on
Big Data analysis for keyword definition using stemming, in: Proceedings of the IEEE 16th
International conference on computer science and information technologies on Computer
science and information technologies, Lviv, Ukraine, 22–25 September, 2021, pp. 184–188.
[15] V. Lytvyn, V. Vysotska, I. Budz, Y. Pelekh, N. Sokulska, R. Kovalchuk, L. Dzyubyk, O.</p>
      <p>Tereshchuk, M. Komar, Development of the quantitative method for automated text content
authorship attribution based on the statistical analysis of N-grams distribution,
EasternEuropean Journal of Enterprise Technologies, 6(2(102)) (2019) 28–51.
doi:10.15587/17294061.2019.186834.
[16] I. Khomytska, I. Bazylevych, V. Teslyuk, I. Karamysheva, The chi-square test and data clustering
combined for author identification, in: Proceedings of the IEEE XVIIIth Scientific and Technical
Conference on Computer Science and Information Technologies, 2023.
[17] I. Khomytska, V. Teslyuk, The Multifactor Method Applied for Authorship Attribution on the</p>
      <p>Phonological Level, CEUR workshop proceedings 2604 (2020) 189-198.
[18] R. Romanchuk, V. Vysotska, V. Andrunyk, L. Chyrun, S. Chyrun, O. Brodyak, Intellectual
Analysis System Project for Ukrainian-language Artistic Works to Determine the Text
Authorship Attribution Probability, in: Proceedings of the International Scientific and Technical
Conference on Computer Sciences and Information Technologies, 2023.
[19] I. Khomytska, V. Teslyuk, A. Holovatyy, O. Morushko, Development of methods, models, and
means for the author attribution of a text, Eastern-European Journal of Enterprise Technologies
3(2(93)) (2018) 41–46. doi: 10.15587/1729-4061.2018.132052.
[38] A. Yarovyi, D. Kudriavtsev, Method of Multi-Purpose Text Analysis Based on a Combination of</p>
      <p>Knowledge Bases for Intelligent Chatbot, CEUR Workshop Proceedings 2870 (2021) 1238-1248.
[39] N. Shakhovska, O. Basystiuk, K. Shakhovska, Development of the Speech-to-Text Chatbot</p>
      <p>Interface Based on Google API, CEUR Workshop Proceedings 2386 (2019) 212-221.
[40] T. Basyuk, A. Vasyliuk, Peculiarities of an Information System Development for Studying
Ukrainian Language and Carrying out an Emotional and Content Analysis, CEUR Workshop
Proceedings 3396 (2023). URL: https://ceur-ws.org/Vol-3396/paper23.pdf.
[41] V. Vysotska, S. Holoshchuk, R. Holoshchuk, A Comparative Analysis for English and Ukrainian
Texts Processing Based on Semantics and Syntax Approach, CEUR Workshop Proceedings 2870
(2021) 311-356.
[42] A. Dmytriv, S. Holoshchuk, L. Chyrun, R. Holoshchuk, Comparative Analysis of Using Different
Parts of Speech in the Ukrainian Texts Based on Stylistic Approach, CEUR Workshop
Proceedings 3171 (2022) 546-560.
[43] S. Yevseiev, et. al., Development of a Method for Determining the Indicators of Manipulation
Based on Morphological Synthesis, Eastern-European Journal of Enterprise Technologies 117(9)
(2022).
[44] O. Cherednichenko, O. Kanishcheva, O. Yakovleva, D. Arkatov, Collection and Processing of a</p>
      <p>Medical Corpus in Ukrainian, CEUR Workshop Proceedings 2604 (2020) 272-282.
[45] A. Dmytriv, V. Vysotska, M. Bublyk, The Speech Parts Identification for Ukrainian Words Based
on VESUM and Horokh Using, in: Proceedings of the 16th International Conference on
Computer Sciences and Information Technologies (CSIT), vol. 2, 2021, September, pp. 21-33.
[46] V. Vysotska, S. Mazepa, L. Chyrun, O. Brodyak, I. Shakleina, V. Schuchmann, NLP Tool for
Extracting Relevant Information from Criminal Reports or Fakes/Propaganda Content, in:
Proceedings of the IEEE 17th International Conference on Computer Sciences and Information
Technologies (CSIT), 2022, November, pp. 93-98.
[47] M. Lupei, O. Mitsa, V. Sharkan, S. Vargha, N. Lupei, Analyzing Ukrainian Media Texts by Means
of Support Vector Machines: Aspects of Language and Copyright, in: Proceedings of the
International Conference on Computer Science, Engineering and Education Applications, 2023,
March, pp. 173-182.
[48] V. Vysotska, Analytical Method for Social Network User Profile Textual Content Monitoring
Based on the Key Performance Indicators of the Web Page and Posts Analysis, CEUR Workshop
Proceedings 3171 (2022) 1380-1402.
[49] K. Shakhovska, I. Dumyn, N. Kryvinska, M. K. Kagita, An approach for a next-word prediction
for Ukrainian language, Wireless Communications and Mobile Computing 2021 (2021) 1-9.
[50] I. Demydov, Architecture of the Computer-linguistic System for Processing of Specialized
Webcommunities’ Educational Content. URL: https://ceur-ws.org/Vol-2616/paper1.pdf.
[51] V. Vysotska, Ukrainian participles formation by the generative grammars use, CEUR Workshop</p>
      <p>Proceedings 2604 (2020) 407–427.
[52] B. Bengfort, R. Bilbro, T. Ojeda, Applied text analysis with Python: Enabling language-aware
data products with machine learning, O'Reilly Media, Inc., 2018.
[53] D. Jurafsky, J. H. Martin, Speech and Language Processing. URL:
https://web.stanford.edu/~jurafsky/slp3/ed3book_sep212021.pdf.
[54] D. Jurafsky, J. H. Martin, Regular Expressions, Text Normalization, Edit Distance. URL:
https://web.stanford.edu/~jurafsky/slp3/2.pdf.
[55] D. Jurafsky, J. H. Martin, Deep Learning Architectures for Sequence Processing. URL:
https://web.stanford.edu/~jurafsky/slp3/9.pdf.
[56] D. Jurafsky, J. H. Martin, Naive Bayes and Sentiment Classification. URL:
https://web.stanford.edu/~jurafsky/slp3/4.pdf.
[57] D. Jurafsky, J. H. Martin, Logistic Regression. URL:
https://web.stanford.edu/~jurafsky/slp3/5.pdf.
[58] D. Jurafsky, J. H. Martin, Neural Networks and Neural Language Models. URL:
https://web.stanford.edu/~jurafsky/slp3/7.pdf.
[59] I. Khomytska, V. Teslyuk, N. Kryvinska, I. Bazylevych, Software-based approach towards
automated authorship acknowledgement-chi-square test on one consonant group, Electronics
(Switzerland) 9(7) (2020) 1–11.
[60] A. R. Sydor, V. M. Teslyuk, P. Y. Denysyuk, Recurrent expressions for reliability indicators of
compound electropower systems, Technical Electrodynamics 4 (2014) 47–49.
[61] V. Lytvyn, et. al., Development of the linguometric method for automatic identification of the
author of text content based on statistical analysis of language diversity coefficients,
EasternEuropean Journal of Enterprise Technologies 5(2(95)), (2018) 16–28. doi:
10.15587/17294061.2018.142451.
[62] V. Lytvyn, et. al., Development of the system to integrate and generate content considering the
cryptocurrent needs of users, Eastern-European Journal of Enterprise Technologies 1(2(97))
(2019) 18–39. doi: 10.15587/1729-4061.2019.154709.
[63] P. Kravets, The Game Method for Orthonormal Systems Construction, in: Proceedings of the
9th International Conference - The Experience of Designing and Applications of CAD Systems
in Microelectronics, 2007. doi: 10.1109/cadsm.2007.4297555.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>