=Paper= {{Paper |id=Vol-3312/paper7 |storemode=property |title=Advanced Method of Synthesis of Semantic Kernel of E-content |pdfUrl=https://ceur-ws.org/Vol-3312/paper7.pdf |volume=Vol-3312 |authors=Sergey Orekhov |dblpUrl=https://dblp.org/rec/conf/momlet/Orekhov22 }} ==Advanced Method of Synthesis of Semantic Kernel of E-content== https://ceur-ws.org/Vol-3312/paper7.pdf
Advanced Method of Synthesis of Semantic Kernel of E-content
Sergey Orekhov
National Technical University “Kharkiv Polytechnic Institute”, Kyrpychova str. 2, Kharkiv, 61002, Ukraine


                Abstract
                The work describes an improved method of synthesis of the semantic kernel of e-content.
                This method is an integral part of the new technology of virtual promotion of goods and
                services. This technology is an alternative way to solve the problem of search engine
                optimization on the Internet. Its main components are the semantic kernel and the progress
                map. The previously obtained results of the execution of WEB projects of virtual promotion
                show that the correct choice of the semantic kernel gives an improvement of the main WEB
                metrics tenfold. Therefore, solving the problem of nuclear synthesis is an urgent problem. In
                this article, a synthesis method is proposed, which is based on a text model that includes
                aspect terms and collocations. The algorithm and results of ego application in a real WEB
                project for the virtual promotion of online services in the US market are also presented.

                Keywords 1
                Semantic kernel, virtual promotion, search engine optimization, bag of words

1. Introduction and related works
    For the first time in works [1-2] the concept of semantic kernel was used to solve the problem of
text classification. This task aims at automatic organization of text documents according to given
categories. To solve it, each text document is presented in the form of a so-called "bag of words"
(BOW). The BOW approach is simple and is used very often to describe the semantic kernel. Its main
limitation is that it assumes independence between terms, since documents in the BOW model are
represented by their terms, ignoring their position in the document, their semantic or syntactic
relationships between other words.
    The BOW model clearly does not take into account multi-word statements, breaking them into
parts. In addition, it treats polysemous words (that is, words with multiple meanings) as a single
entity. For example, the term "organ" can have the meaning of a body part when it appears in a
context related to a biological structure, or the meaning of a musical instrument when it appears in a
context related to music.
    The work [2] states that each class of terms, that is, each "bag of terms" has two types of
vocabulary: one is a "main" vocabulary closely related to the subject of this class, the other type is a
"general" vocabulary, which may have a similar distribution in different classes. Thus, two documents
from different classes can have many words in common and can be considered similar to the BOW
representation.
    To solve these problems, several methods have been used that use a measure of relatedness
between terms in the areas of word meaning determination, text classification, and information
retrieval. Semantic relatedness computation can be fundamentally divided into three categories, such
as knowledge-based systems, statistical approaches, and hybrid methods that combine both ontology-
based and statistical information.
    Knowledge-based systems use a thesaurus or ontology to improve the representation of terms by
taking advantage of the semantic relatedness between terms [3-6]. For example, in [4] the distance
between words in WordNet is used to detect semantic similarity between English words. The study
[4] uses superconcept declaration with different distance measures between words from WordNet,

MoMLeT+DS 2022: 4th International Workshop on Modern Machine Learning Technologies and Data Science, November, 25-26, 2022,
Lviv-Leiden, Ukraine
EMAIL: sergey.v.orekhov@mail.com (S. Orekhov)
ORCID: 0000-0002-5040-5861 (S. Orekhov)
           ©️ 2022 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)
such as inverted path length, Wu-Palmer measure, Resnik measure, and Lin measure. The second type
of computation of semantic relatedness between terms is corpus-based systems, in which some
statistical analysis is performed based on the relations of terms in a set of training documents to reveal
hidden similarities between them [7]. One of the well-known corpus systems is Latent Semantic
Analysis (LSA) [8], which partially solves the synonymy problem. Finally, approaches in the last
category are called hybrid because they combine information derived from both ontology and
statistical corpus analysis [3].
    Also in work [2] it was proposed to analyze the text corpus through the creation of a number of
semantic cores, which are divided into groups: higher-order cores, iterative semantic cores, and lower-
order cores. This classification makes it possible to improve the efficiency of document classification
compared to traditional ones. Semantic kernels of the following types are distinguished: linear,
polynomial and RBF kernels, when taking advantage of higher-order relationships between terms and
documents. The basis for such a classification is the classic model of generating queries to the
document database within the search server. This is a vector-space model (vector space model) [9].
Thus, vector-space model metrics are used to assess the degree of connectivity of terms in the "bag of
terms".
    As you know, the semantic core is a set of keywords formed on the basis of a given criterion. An
example of such a criterion can be semantic relatedness between terms. But the approach proposed
above cannot be used in the case of virtual promotion, because the semantic kernel to be formed
includes a set of terms describing a given class of needs. Such a set of terms is connected not only by
semantics, but more by the need of a potential client to buy a given product. That is, the connection
between the terms is formed more due to market events, due to the laws of marketing.
    Thus, in the technology of virtual promotion, it is necessary to form a semantic kernel as a set of
keywords that are connected to each other both semantically and on the basis of market events, which
are confirmed by the "4P" principle.
    In virtual promotion, the semantic kernel plays two important roles.
    The first area of kernel usage is messages in the promotion channel. The kernel describes the
product being sold. It should have a semantic affinity with the class of need. That is, it is still
necessary to have an additional core that describes the need that this product covers in the life of a
potential buyer. There can be any number of such classes of needs. In addition, each kernel, according
to marketing principles, should have keywords that emphasize its uniqueness for the buyer. Thus, on
the one hand, the semantic kernel has semantic affinity with popular queries on the Internet, on the
other hand, the kernel has unique elements that separate it from others. This allows you to separate
one product from another in the mind of a potential buyer.
    The work [11] proved the need to divide keywords from the kernel into classes, according to
popular queries in search servers. Such classes are formed according to the main requests from the
buyer: what needs are covered, where and when the product can be purchased. Such a structure is
acceptable for the buyer, because it helps him quickly find the product and compare it with analogues.
    The second is a means of managing the process of virtual promotion. By changing the kernel, it is
possible to improve the results of the implementation of our technology, which was shown in works
[11-13]. It was established that the gradual increase of the semantic kernel in the promotion channel
leads to an increase in the value of the main WEB metric - the traffic of the WEB resource.
    Given these prerequisites, let's build a mathematical model of semantic kernel based on the text
model [11].

2. Mathematical model of semantic kernel
  As input data we have a text document D, which can be written in any language and presented as
HTML or plain text:
                          D  {s1 ,..., s n } , si  kwi  gwi  esi ,                                (1)
   where s i , i  1, n is a sentence that ends with a comma or other end-of-sentence sign. Each
sentence is a combination of three sets: a set of product description keywords kwi , a set of sentence
ending characters gwi , and a set of keywords esi that express the overall content.
   Based on the approach [10], when the user's query should include the answer to three questions
(what, where, when) to describe the product, formula (1) can be simplified.
                        D  {s1 ,..., s n } , si  kwi  gwi , si  {wi1 ,..., wim } ,    (2)
   That is, you can ignore the set esi . But for selecting keywords, you can use morphological
homonymy exclusion methods to build semantic kernels from the first to at least the fifth order. Or
use unigram or trigram and higher models.
    This will make it possible to build semantic kernels that include more than two semantically
related keywords. In addition, such a set of words (for example, three or more) guarantees a more
detailed description of the product, which means that it strengthens the marketing component, that is,
the connection of keywords from a marketing point of view. A typical trigram model looks like this:
         wij  arg max P( wij wij1 , wij2 ) P( wij wij1 , wij1 ) P( wij wij1 , wij 2 ) .       (3)
   Formula (3) uses probability to link three key words into a single complex. In this case, the
probability can be calculated as follows:
                                               F ( wij , wij1 )
                           P( wij wij1 )                         ,                                 (4)
                                                  F ( wij )
   where wij , wij 1 keywords, F ( wij , wij 1 ) is the frequency of occurrence of two words together,
F ( wij ) is the total frequency of occurrence of the word wij w in the text document D.
  In addition, it is advisable to also consider the parameters of the text document D itself. Among
which the following should be noted:
                                Pn  Pp
                           S              ,                                                         (5)
                              Pa  Pv
                              P  Padv
                          Q a         ,                                                             (6)
                               Pn  Pv
                              P
                          A v ,                                                                     (7)
                              N
                                    Pv
                          Di               ,                                                        (8)
                               Pa  Pn  Pp
                              P
                          Z C,                                                                      (9)
                              PS
   where S is the objectivity of the text document, Q is its quality, A is activity, Di – dynamism of the
text document, Z – coherence, Pn – number of nouns, Pa – number of adjectives, Pv – number of
verbs and verb forms (participle, adverb), Pp – number of pronouns, Padv – number of adverbs, PC –
the number of prepositions and conjunctions, PS – the number of independent sentences in the text, N
– the number of words in the text document.
    Why should homonyms be analyzed? Because probably each product has several situations
(needs) when it should be used. Then, if there are several descriptions of the need, that is, classes of
need in the given product, then the semantic kernel specifying the product may belong to one of these
classes.
    By changing the kernel, you can go virtually from one class of need to another. Thus, the analysis
of homonyms guarantees the correct selection of classes of needs and their connection with semantic
kernels.
    Formulas (5)-(9) make it possible to analyze a text document for the purpose of identifying classes
of needs as the first step on the way to building a semantic kernel.
    In computational linguistics, syntactically correct word combinations that are stable in a statistical
sense are usually called collocations. Most multi-word terms are collocations. The MI measure and its
modifications are most often used to identify terms as collocations:
                                        f ( a, b) N
                          MI  log 2                ,                                                  (10)
                                        f (a) f (b)
    where N is the number of words in the text document. Function f(a,b) is the frequency of co-
occurrence of words a and b, which evaluates the degree of dependence of the occurrence of two
words in the corpus on each other. Functions f(a), f(b) are the frequencies of occurrence of words a
and b separately from each other.
    If the identified two-word collocations are considered as a whole, then with the help of the
mentioned measures, longer word combinations (three-word, four-word, etc.) can be recognized in the
text, which allows to extract long terms with an arbitrary syntactic structure using statistical criteria.
    Thus, two mechanisms are established. The first is how to exclude or find homonyms. The second
is how to evaluate collocations. Collocation is the ideal means of describing the semantic kernel. That
is, our task is to identify collocations excluding homonyms.
    Then, in advance, the task of forming the semantic core is formulated as follows: by evaluating
(5)-(9) of the text document D for the presence of homonyms, determine collocations of key words
using the given metric (10).
    The construction of the method of forming the semantic kernel faces two problems. The first is to
establish the semantic relatedness of keywords in the kernel. The second problem is that metrics (5)-
(10) depend on the number of words in a text document. The fact is that modern marketing strategies
aim to create descriptions of goods and services in an abbreviated style.
    Therefore, the specified metrics may not accurately express the text attributes we need.
    To solve the specified tasks, you can apply the mechanism of determining aspect keywords,
because they most often create collocations and precisely for the definition of goods and services.
    If both individual nouns and noun groups are extracted as aspect terms, it is necessary to use
additional features to more accurately determine the length of the noun group. Most often, so-called
contextual features are used, which estimate the frequency of occurrence of a word combination with
the frequency of the context. Such signs allow you to determine the boundaries of the nominal group.
For example, the so-called FLR measure is used:
                            FLR(a)  f (a) LR(a) ,                                                      (11)
                          LR(a)  l (a)r (a) ,                                                       (12)
    where f(a) is the frequency of appearance of the aspect keyword a. Function l(a) is the number of
different words to the left of a. Function r(a) is the number of different words to the right of a.
    Next, noun groups with this measure greater than y are selected average for phrases. Thus, this
measure primarily selects nouns that have a large variety of words at their boundaries, indicating that
the analyzed term a is not a fragment of a longer phrase.
    Another criterion aimed at the same goal is the well-known C-value metric [11], which reduces the
weight of a given word or phrase if it is part of a longer frequency phrase. Thus, it is assumed that this
longer phrase can be considered as a candidate aspect, and the current one represents its fragment.
Such a sign for selecting aspects is used in the work.
    Then finally, the task of forming a semantic kernel can be defined as follows: given a text
document D consisting of a number of sentences, select a set of aspectual collocations of at least the
third order to ensure a rational level of semantic and marketing relatedness.

3. Proposed method
    First, let's describe the method of formation verbally. As input information, we have a text
document D given by formula (1). But the set esi is redundant, so it must first be removed to obtain a
document of the form (2). Then we introduce the normalization and lemmatization operation:
 NL : D  Dnl . Exactly Dnl  KW  GW . The set of keywords of general content cannot be
excluded, because these words play the role of a link between the main aspect terms.
    Next, the operation of determining aspect terms and the operation of determining collocations
should be applied to the set Dnl  KW  GW : A : Dnl  Dnl and K : Dnl  Dnl . As a result, a
                                                                a            a        ak


set is formed Dnl  KWnl  GWnl . This set will be the source of semantic kernels.
               ak      ak     ak
    However, the presence of a set of keywords that can be part of the semantic kernel is not enough
in our case. The reason is that the formation of this set was performed only taking into account the
semantics and properties of the document D. Even if the text corpus TDC  {D1 ,..., DC } is
considered, where D  TDC . We must also consider the other side – WEB statistics of the use of
plural keywords from D nlak . Therefore, we add the following operation, the purpose of which is to
form an estimate of each keyword and each word combination that can be formed on the basis of the
set D nlak . We introduce the evaluation metric based on VEB statistics:
                           Fweb : Dnlak  M web , M web  R , M web  0 .
    As a result, pairs SK  {( Dnlak , M web )1 ,..., ( Dnlak , M web ) C } are formed for each document of the
TDC text corpus. We will assume that keywords with the maximum value of the metric M web fall
into the semantic kernel:
                          sk  max M web ( w) ,
                                 wTDC
     where w is a phrase that includes at least two keywords.
     Thus, to form a kernel, the following operations must be performed in sequence: normalization,
lemmatization, aspectization, collocation, and evaluation. In addition, it was shown above that it is not
enough to have only a semantic kernel. It is necessary to first form classes of needs, that is, sets of
keywords that describe unique needs that are covered by these other products. These are components
of a marketing strategy, where it is described that a given product or service can be used for an urgent
need of a given group of customers.
     Then, first of all, the text corpus should be analyzed for identifying classes of needs. For this,
metrics (5)-(9) should be used. Accordingly, our method should be supplemented with one more step
- finding a class of needs.
     The work suggests that one document contains a description of the product for sale, or a
description of the need that this product covers. It is also possible with high probability that one
document contains both descriptions at once. Then we will assume that the text corpus contains 20%
of documents expressing the main idea about the need and the product. In order to find this set of
documents, it is necessary to rank them. Then it is suggested to first calculate for each document the
value of metrics (5)-(9). A document with the maximum value contains keywords that describe as
fully as possible one of the classes of need and the product that covers it.
     As a result, two tasks are formed. The first task is to organize a set of documents in order to select
a list of documents where keywords should be searched for the formation of semantic cores. The
order determines the priority according to which the classes in our study are organized. That is, the
single document with the best number is the document that contains a separate requirement class.
     The second task involves searching for documents where there is no description of need classes,
but there is a description of semantic kernels that belong to some need class. That is, this set of
documents, as it were, functions to confirm the existence of a class of need and a product that covers
it.
     This structure of tasks is based on the classical method of searching Page Rank [14]. The main
principle of implementing this method is that the WEB resource that has the maximum number of
links from other WEB resources gets the maximum result. Based on the Pareto principle, we will
select 20% of the documents with the maximum values of the document quality assessment metrics
(5)-(9), because they contain 80% of all the semantic connections and marketing knowledge we need
[15].

4. Proposed algorithm
    The paper proposes a method for forming the semantic kernel, which includes two cycles.
Consider the first loop and its verbal algorithm.
    Stage 1. We take the i-th document of the TDC corpus for processing. We calculate the metrics
(5)-(9).
    Stage 2. We build table 1, which accumulates the list of documents and values of metrics (5)-(9).
We perform these two stages until table 1 is completely filled with all the documents that are
available in the text corpus.

Table 1
Search for documents describing classes of needs
   Document            S               Q                        A              Di            Sum
     Number


     Stage 3. We calculate the sum of these metrics. We sort them all documents of the text corpus
according to the maximum value of the "Sum" column. As a result, we choose the first values of this
column, approximately 20% of the total number of documents. Thus, the first documents in terms of
the value of the last column express the classes of need and the description of the product that belongs
to them covers. These documents will form a set of TDC  . It can be used as input information for
the second cycle of our method to build a set of keywords from which the semantic core of e-content
will be formed.
     Consider the second cycle. We will assume that the input information is a set of documents
 D  TDC . The following steps are suggested.
     Stage 1 (normalization). We take the i-th document of the corpus for processing Di  TDC  .
From all the sentences of the document                      Di , we select three groups of keywords
sij  kwij  gwij  esij . Next, we exclude the plural of words esij completely. As a result each
corpus document includes two groups of keywords: aspect terms and general content words
( sij  kwij  gwij , sij  Di ).
     Stage 2 (search for collocations). We build table 2, which allows us to identify the list of
candidates for collocations of various orders. In table 2, we enter the words w  kwij  gwij and
determine the meaning of metric (10).
    As you can see, Table 2 is a matrix that allows you to identify two-word collocations. But for our
research it is also important to know about the existence of three- and four-word collocations.
Therefore, at this stage, having obtained two-word collocations, it is necessary to rebuild Table 2 in
order to iteratively analyze more complex combinations of keywords. This process continues until the
values of the metric (10) remain unchanged, that is, all complex collocations are obtained.

Table 2
Candidates for collocations
     Metric MI       Candidate 1              Candidate 2    Candidate 3            …     Candidate M
    Candidate 1
        …
   Candidate M

     We will denote the set of collocations as follows: Di  {coli }  {coli }  ...  {coli } , where
                                                                           2        3          R


coli2 – collocations of the second order, col i3 – collocations of the third order and so on.
    But it is empirically shown that for virtual promotion it is enough that R  5 . This is due to the
fact that queries in search engines of more than five words have a very low probability [16].
     Stage 3 (search for aspect terms). Again, we consider each document in the document D the
keywords w  kwij  gwij . We calculate the metrics (11)-(12) for each candidate in Table 3.
    The largest values of metrics (11)-(12) allow, again using the Pareto principle, determine about
20% of aspect terms from their total number in the document Di  {w}i . Next, if we compare the
data of tables 2 and 3, we get a list of collocations, where aspect terms are present.
    These collocations are the first candidates for the semantic kernel of e-content. That is, we get an
intermediate result: Di  {col ia 2 }  {col ia 3 }  ...  {col iaR } .

Table 3
Candidates for aspect terms
   No    Candidate (aspect term a)            f(a)           FLR(a)            LR(a)            Priority


     Stage 4 (adding aspect terms). At the same time, having only collocations in the semantic kernel
is inefficient, so it is necessary to add to the set of candidates ATi some aspect terms that are not
included in the final set Di for each document: Di  {colia 2 }  {colia 3 }  ...  {coliaR }  { ATi } .
    Having the semantic kernels of the e-content of a given WEB resource, it is possible activate the
process of virtual promotion on the Internet. Because, as was shown in [17], the semantic kernel is the
message and driving impulse in our virtual promotion.
   The action diagram that describes the proposed algorithm is presented in Figure 1.
   Consider an example calculated according to the proposed algorithm based on the results of the
WEB project.

5. Results
    Let's consider the initial conditions that existed at the start of the WEB project for the American
market of WEB services. This project was started to meet the need of users to build a psychological
portrait of an individual online. The start of this WEB project took place when classical methods of
search engine optimization gave no effect.
    The desired effect was primarily perceived as an increase in the number of users of this Web
service. This goal can be achieved by increasing the number of visits to this WEB resource.
Therefore, the synthesis of semantic kernel was offered to the owner of this WEB resource in order to
increase the value of the WEB metric - traffic.
    At the time of the start of the test WEB project (Figure 2), the value of the metric was minimal.
Therefore, it was proposed to use a new approach (kernel synthesis) in order to increase traffic with
its help.
    Table 4 shows a fragment of the e-content of this site at the time of the start of synthesis. This
WEB resource included a list of 13 documents. Only nine of them contain text e-content about the
market, need or product. At the first stage, the first cycle of the semantic kernel synthesis algorithm
was performed, which is specified in section 4. Table 5 contains the results of processing these nine
documents. According to the algorithm, we will select two documents to create a set of keywords that
will be candidates for the semantic kernel. They are shown also in table 5.
     Next, we perform the second cycle of the synthesis algorithm. To do this, we remove all words
except plural nouns, adjectives, adverbs and verbs. We are normalizing these words. But first, we
choose the document with the maximum value of the document quality metrics - the eighth line of
table 5. Test e-content from this page is presented in table 4. Figure 3 presents the keywords and
calculation of the MI measure for this e-content.
     The data in Figure 3 demonstrate the fact that due to the small number of keywords in e-content,
it is not possible to detect collocations. Therefore, we proceed to the next step of our algorithm,
namely, the analysis of aspect terms.
     The results of the search for aspect keywords are based on the calculation of metrics, which are
shown in Figure 4. The following sequence of aspect keywords was revealed: "CelestialTiming, easy,
accurate, tool, bring, user, site, natural, rhythm, providing, goal, universe, effective, collaboration".
Thus, our algorithm forms a set of candidates for the semantic kernel of e-content.
    Within this test project, the following strategy for launching the semantic kernel was proposed. At
the first stage, due to the fact that this web resource was a startup, it is necessary to implement a
semantic core from only one aspect keyword - CelestialTiming. Because potential users of this service
do not have any information about this startup. At the second stage, it is necessary to expand the
semantic kernel with the following words from the found set.
                                                                    Start of first cycle (sentece processing)



                         Upload text corpus TDC



                          Process document Di                Define the metrics A, Z, S, Q, Di




   i=i+1
                                         The last document in the corpus?
                          NO

                                          YES

                         Process document table

                                                                 Formed the set of documents                     D  TDC 


                         Keyword normalization                     Starting the second cycle (keyword processing)




                           Collocation search

      i=i+1

                          Aspect term seaching               Build the set of aspect terms and collocations




                          Adding aspect terms



                             NO          The last document in the corpus?

                                        YES


                       Forming a set of candidates        Di  {colij2 a }  {colik3a }  ...  {colilRa }  ATi


                                                                                                    i  1, TDC 


Figure 1: Activity diagram


                                      Starting point of synthesis of semantic kernel




Figure 2: Traffic statistics of test WEB project
Table 4
A fragment of the e-content of the first test WEB resource
    No                                 E-content                                       Comment
     1  CelestialTiming is the product of international collaboration with a        Description of the
        goal of providing site users with an accurate and easy tool to bring          main product
        them in touch with their natural rhythms of the universe and
        enable them to make more effective personal and business
        decisions.
        Members of the CelestialTiming team have extensive experience
        and advanced degrees in the fields of physics, engineering,
        computer science, psychology, and education. They have developed
        professional psychological and astrological software, taught in
        colleges and universities, and published books and articles on a
        variety of topics…

Table 5
Search for documents describing classes of needs
   No       Web page          S             Q             A           Di          Z             Sum
    1         Home          1.336        0.265          0.127       0.347       0.417          2.491
    2       celebrity         1.196        0.339        0.138       0.415       2.125          4.213
    3         myself          0.778        0.455        0.152       0.455           2.0        3.838
    4        another          0.636         0.5         0.182        0.5            2.0        3.818
    5         saved           0.83         0.451        0.152       0.403       1.609          3.445
    6      applications       1.071        0.287        0.130       0.477       2.764          4.728
    7          free           1.263        0.407        0.125       0.303           3.4        5.499
    8        contact           1.5         0.469        0.057       0.111       7.667          9.803
    9        account          0.667        0.455        0.15        0.471           2.25       3.991




Figure 3: A fragment of the table with calculations of the value of the MI metric
Figure 4: Calculating the value of metrics for finding candidates for facet keywords

   The results obtained (Figure 2) show that the introduction of a new semantic kernel, which
symbolizes the brand of our online service, has led to an increase in traffic. That is, new users who
were interested in learning about a new online service came to the WEB site. However, this effect was
short-lived. This is also understandable, since the effect of aging of the semantic kernel was
established in the works [11-12], which in this case manifested itself.

6. Summary

    The paper demonstrates a new improved approach to the synthesis of the semantic kernel of e-
content. This approach has been tested on a real WEB project in the US online services market. The
article shows the effect of the implementation of the synthesis algorithm, which was positive for a real
WEB project.
    Thus, we can say that the following new results have been obtained:
      the task of forming the semantic kernel of e-content received further theoretical and
methodological development;
      for the first time, it is proposed to apply metrics for evaluating texts for evaluating e-content
keywords and forming a semantic kernel, taking into account the entire text corpus of a given WEB
resource;
      the method of forming the semantic kernel of e-content, which includes two cycles, was
further developed. The first is for establishing a list of documents for processing, and the second is for
forming candidates (key phrases and words) for the semantic kernel.
   In the future, it is planned to implement this method as a separate software component on the
platform Node JS.

7. References
[1] Altinel B., Ganiz M. C., Diri B. A simple semantic kernel approach for SVM using higher-order
     paths. // IEEE International Symposium on Innovations in Intelligent Systems and Applications
     (INISTA) Proceedings. – 2014. – P. 431-435.
[2] Altınel B., Ganiz M., Diri B. A corpus-based semantic kernel for text classification by using
     meaning values of terms. Engineering Applications of Artificial Intelligence. 2015. – Vol. 43. –
     P. 54-66.
[3] Nasir J.A., Varlamis I., Karim A., Tsatsaronis G. Semantic smoothing for text clustering. //
     Knowledge-Based Systems. – 2013. – Volume 54. – P. 216-229.
[4] Budanitsky, A., Hirst, G. Evaluating WordNet-based measures of lexical semantic relatedness. //
     J. Computational Linguistics. – 2006. – Volume 32(1). – P. 13–47.
[5] Bloehdorn, S., Basili, R., Cammisa, M., Moschitti, A. Semantic kernels for text classification
     based on topological measures of feature similarity. // Proceedings of the Sixth International
     Conference on Data Mining (ICDM). – 2006. – P. 808–812.
[6] Luo Q., Chen E., Xiong H. A semantic term weighting scheme for text categorization. // Expert
     Systems with Applications. – 2011. – Volume 38(10). – P. 12708-12716.
[7] Zhang, Z., Gentile, A.L., Ciravegna, F. Recent advances in methods of lexical semantic
     relatedness–a survey. // Natural Language Engineering. – 2012. – Volume 1(1). – P. 1–69.
[8] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R. Indexing by latent
     semantic analysis. // Journal of the American society for information science. – 1990. – Volume
     41(6). – P. 391-407.
[9] Salton G., Wong A., Yang C. A vector space model for automatic indexing. // Computer science.
     – 1975. – Volume 18. – P. 613-620.
[10] Orekhov S., Malyhon H., Liutenko I., Goncharenko T. Using Internet News Flows as Marketing
     Data Component. // CEUR-WS, 2020. – Volume 2604. – P. 358-373.
[11] Orekhov S., Malyhon H., Goncharenko Т. Mathematical Model of Semantic Kernel of WEB site.
     // CEUR-WS, 2021. – Vol. 2917. – pp. 273-282.
[12] Orekhov S., Malyhon H., Stratienko N., Goncharenko Т. Software Development for Semantic
     Kernel Forming. // CEUR-WS, 2021. – Vol. 2870. – P. 1312–1322.
[13] Godlevsky M., Orekhov S., Orekhova E. Theoretical Fundamentals of Search Engine
     Optimization Based on Machine Learning. // CEUR-WS, 2017. – № 1844. – P. 23-32.
[14] Dode A., Hasani S. PageRank Algorithm. // Journal of Computer Engineering. – 2017. – Volume
     19, Issue 1. – P. 1-7.
[15] Koch R. The 80/20 Principle. The Secret of Achieving More with Less. – Great Britain: Nicholas
     Brealey Publishing Limited, 1998. – 313 p.
[16] Rowley J. Understanding digital content marketing // Journal of Marketing Management. – 2008.
     – Т. 24. – Volume. 5-6. – P. 517–540.
[17] Orekhov S., Kopp A., Orlovskyi D. Map of Virtual Promotion of a Product. // Advances in
     Intelligent Systems, Computer Science and Digital Economics III. Switzerland: Springer, 2022. –
     pp. 1-11.