<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>similarity matrices⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lyubomyr Chyrun</string-name>
          <email>Lyubomyr.Chyrun@lnu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmytro Uhryn</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuriy Ushenko</string-name>
          <email>y.ushenko@chnu.edu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Kalancha</string-name>
          <email>kalancha.artem@chnu.edu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ivan Franko National University of Lviv</institution>
          ,
          <addr-line>Universytetska Street 1 79000 Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ternopil Ivan Puluj National Technical University</institution>
          ,
          <addr-line>Ruska Street 56, 46025, Ternopil</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Yuriy Fedlovyvh Chernivtsi National University</institution>
          ,
          <addr-line>Kotsiubynskoho Street 2 58012, Chernivtsi</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This study analyzed and critically reviewed the Temporal Semantic Influence (TSI) method, developed to assess the mutual influence between information sources. An approach to its optimization and systematic parameter selection using the deviation of similarity matrices as a key performance evaluation criterion is proposed, which should better distinguish between sources. The impact of each of the main parameters of the algorithm (time factor weight, message horizon, threshold value, and similarity function) was analyzed and the impact on the final results was visually demonstrated. The optimized algorithm was applied to the Cascade Influence graphs of the algorithm to compare the impact of the optimization. The obtained conclusions demonstrate the possibility of improving the accuracy and stability of the TSI method, and also form the basis for further improvement of algorithms for analyzing information flows in network environments.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;similarity function</kwd>
        <kwd>TSI</kwd>
        <kwd>similarity matrix deviation</kwd>
        <kwd>influence graphs</kwd>
        <kwd>algorithm parameterization 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Traditional methods based only on indicators of lexical similarity of messages no longer meet
modern requirements. They allow you to record superficial similarity of content, but do not ensure
the identification of the primary source of information, its separation from secondary relays and
tracking of temporal patterns of its distribution.</p>
      <p>
        Under such conditions, the development and implementation of systems capable of
automatically processing large amounts of text data, integrating quantitative and temporal
characteristics of messages, as well as identifying structural and hidden connections between
channels becomes particularly important. The introduction of new algorithmic approaches and
analysis methods allows not only to reduce the load on human resources, but also to significantly
increase the accuracy and speed of identification of key information nodes. That is why the
development of methods that combine semantic and temporal analysis is today considered one of
the priority areas of research in the field of natural language processing [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], open source
monitoring, and the study of information flows.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem statement</title>
      <p>Traditional approaches to comparing information sources are mainly based on the analysis of the
lexical content of messages. The most common methods, in particular the use of cosine similarity
and other statistical metrics, allow assessing the degree of similarity of content between different
channels. However, such methods have significant limitations: they record only the fact of textual
similarity, but do not take into account the temporal dynamics of the appearance of messages. As a
result, it is impossible to determine which source is the initiator of the dissemination of
information, and which acts as its relay.</p>
      <p>
        This aspect is of particular importance in the conditions of modern information confrontations,
when the speed of reaction to a message becomes a key indicator of the influence of the source.
Ignoring the time dimension leads to a distortion of the real picture of the interaction between
channels, since the order of publications, the delay in the dissemination of similar messages and
their sequence are not taken into account. As a result, the analyst receives only a static “snapshot”
of content similarity, devoid of the critical context that determines the actual strength and
direction of information influence. During periods of intense disinformation campaigns, this
becomes a serious drawback, as the speed of transmission and the efficiency of publications often
shape public opinion faster than the content of individual messages [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        To overcome these limitations, the study proposes to expand the analysis by adding an
additional is the temporal dimension. It allows us to assess not only the semantic proximity of
messages between different channels, but also the time difference between their appearance. This
approach allows us to quantify how quickly one source picks up information from another, and,
accordingly, describe the degree and direction of mutual information influence. The integration of
this dimension into analytical algorithms opens up new opportunities for identifying key nodes in
the information network, clustering sources [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], classifying sources [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], predicting the spread of
content, and early identification of coordinated information operations.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Purpose of the article</title>
      <p>In a previous study, the Timed Semantic Influence (TSI) method was developed and tested, which
integrates the analysis of semantic similarity and time intervals between message publications. The
application of this approach to a sample of Ukrainian news Telegram channels demonstrated its
effectiveness in determining the direction and intensity of information influence between sources.
At the same time, the results obtained were based on fixed algorithm parameters and a classical
approach to assessing textual similarity, which significantly limits the potential for further
improvement of the method.</p>
      <p>The purpose of this work is to optimize the TSI algorithm with an emphasis on the selection and
systematic adjustment of its key parameters: similarity thresholds, time horizons, weight
coefficient α, and others – in order to increase the accuracy and stability of information influence
assessment. In addition, the study involves assessing the possibilities of modifying the algorithm by
using alternative message similarity measurement functions, in particular, contextual models and
modern metrics for comparing short texts. Such an extension will allow not only to improve the
quality of the classification of relationships between channels, but also to test the universality of
the proposed approach on various data sets, ensuring its scalability and relevance for a wide range
of information flow analysis tasks.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Times semantic influence algorithm</title>
      <p>The first stage of the TSI algorithm is the selection of two information sources, between which the
mutual influence is analyzed. In the framework of this study, such sources are individual Telegram
channels. For convenience, they are designated as channel A (first channel) and channelB (second
channel). The justification of the choice of channels is of key importance, since the reliability,
reproducibility and interpretability of further conclusions depend on it. In the selected time
interval, both channels must demonstrate sufficient activity in publishing messages, otherwise the
final TSI coefficients may turn out to be statistically unstable and lose their analytical value.</p>
      <p>The second stage involves the selection of a specific message from channel A, which serves as
the basic unit for comparison. This message is designated as M and must be clearly identified by
the date and time of publication, since the algorithm is based on the precise measurement of time
intervals.</p>
      <p>Next, an array of messages N from channel B that can potentially correspond to message M is
formed. This array is defined within the specified parameterhorizon_hours (event horizon), i.e. ± h
hours from the time of publication of M. Messages from channel B that fall outside this time range
are filtered out. The parameter h is one of the key parameters for tuning the algorithm, since it
determines the sensitivity to short-term influences and the level of noise in the sample.</p>
      <p>After forming the array N, a search is performed for a relevant pair of messages (the base M
from channel A and the potential m from channel B). For this, semantic similarity is calculated
based on vector representations of the texts. The similarity threshold similarity_threshold (S) is set
as a parameter of the algorithm. If no message from N exceeds S, message M is skipped. Tuning the
parameter S is critically important: too high a threshold can cut off relevant pairs, while too low a
threshold can create an excessive number of false matches.</p>
      <p>In the basic implementation, the method of sorting all messages from the array N with the
selection of the pair with the maximum semantic similarity is used. Alternatively, it is possible to
use the method of sorting messages by temporal proximity and further checking for similarity in
ascending order Δt. Testing both approaches is one of the tasks of further optimization of the
algorithm. At the next stage, the TSI coefficient is calculated for each selected pair of messages. It
integrates the semantic proximity of messages and the time interval Δt between their publication.
The study considers the influence decay function:</p>
      <p>TSI =sm (ma , mb)⋅e
−α 6Δ0t
(1)
In this expression, sm(mₐ, mb) denotes the cosine similarity between the texts, Δt is the absolute
difference in minutes between the publications, and α is a parameter that regulates the weight of
the time component. The optimal selection of α allows you to shift the balance between the
semantic and time components of influence. The algorithmic complexity of the proposed method is
**O(n) = log n**, where *n* denotes the size of the channel array. This estimation is based on the
fact that, during the comparison process, each pair of messages between two channels is evaluated
only once. The inverse value can be interpreted from the results of the previous computations.</p>
      <p>After calculating the TSI coefficients for all relevant pairs, the direction of influence is
determined by the chronology of publications. Then the obtained values are aggregated for each
direction, forming the integral indicator Total TSI as the sum of all atomic TSI values, which is
recorded in the matrix of mutual influence of channels and serves as the basis for further
quantitative and visual analysis of information interaction.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Method optimization and tuning</title>
      <p>
        The TSI algorithm, first implemented in our previous studies, has proven its effectiveness in
detecting mutual information influence between sources. In this work, it has undergone significant
optimization: the introduction of caching of intermediate calculations and the review of key text
operations have significantly reduced the computational complexity and increased the speed when
processing large samples of messages. The next stage is the parameterization and tuning of the
algorithm [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We investigate the impact of changing such parameters as the coefficient α (the
weight of the time component in the formula), the message horizon (horizon_hours), and the
similarity threshold (similarity_threshold) on the final results. In the experiments,α varied from 0.1
to 0.9; horizon_hours – from 2 to 8 hours; similarity_threshold – from 0.4 to 0.8. In addition, the
algorithm was configured to use different similarity assessment functions between messages,
which allowed us to compare the effectiveness of alternative metrics. In total, 315 combinations of
parameters were generated and the algorithm was run for each of them.
      </p>
      <p>
        Particular attention was paid to similarity functions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], since they determine the method of
measuring the similarity between two messages:
      </p>
      <p>
        Cosine Similarity is a classic method in which texts are considered as vectors in a
multidimensional space (for example, token frequencies), and similarity is calculated as the cosine
of the angle between them, see (2). A value approaching 1 indicates greater similarity of texts [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
The advantages are high calculation speed and interpretability of results, the disadvantage is
ignoring word order and context.
(2)
(3)
cs ( si , s j)=
      </p>
      <p>
        z ( si)T z ( s j)
|z ( si)|2|z ( s j)|2
∈[
        <xref ref-type="bibr" rid="ref1">−1 , 1</xref>
        ]
where z(si) is the corresponding vector for corresponding message si.
      </p>
      <p>
        Jaccard Similarity is a measure based on the ratio of the number of shared unique tokens to
their union [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This approach works well for short texts or headlines and allows us to assess the
degree of thematic overlap, but is less sensitive to the number of repetitions and stylistic nuances,
see (3).
      </p>
      <p>n
∑ min ( xi , yi)
J ( x , y )= i=n1
∑ max ( xi , yi)
i=1
where x and y are the lists of fuzzy matching scores</p>
      <p>
        Fuzzy-matching Similarity is a method of approximate token comparison (e.g., via RapidFuzz)
that finds the best matches for each word in another message, taking into account orthographic
and morphological differences, see (4). This approach is useful when texts may contain minor
discrepancies or errors [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Its advantage is higher tolerance to “noisy” data.
      </p>
      <p>1 n</p>
      <p>∑ max1≤ j≤m fuzz (t1,i , t 2, j)+
FuzzSim (T 1 , T 2)= n i=1
1 m</p>
      <p>∑ max1≤i≤n fuzz (t 2, j , t1,i)
m j=1</p>
      <p>
        (4)
2
where fuzz(x,y)∈[
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] – this is your normalized “fuzzy string similarity” score between two tokens,
n=|T1|, m=|T2| is the number of corresponding tokens, t1,i, t2,j – individual tokens.
      </p>
      <p>Using these three metrics and the similarity function in different combinations with α,
horizon_hours, and similarity_threshold allowed us to investigate how each configuration affects the
stability and quality of the interaction matrices. Thus, after describing the algorithm, we moved on
to the stage of systematic tuning, which was an important step in increasing the accuracy and
flexibility of TSI.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Interpretation of tuning results</title>
      <p>To empirically test the improved TSI algorithm, a limited data sample was used – messages from
eight Ukrainian Telegram information channels over a ten-day period. This approach provided the
possibility of quickly running the algorithm multiple times with different parameters, which made
it possible to simultaneously control the quality of the obtained results and the time of calculations
on an identical data set.</p>
      <p>
        At the output, the TSI algorithm forms a matrix of similarity/interaction between information
sources for each combination of parameters. To compare the results of different settings, we used a
statistical estimate of the variance of the elements of this matrix. The main idea is that a higher
variance indicates a better ability of the algorithm to differentiate the degrees of mutual similarity
of channels: if all similarity values are close to each other, the matrix turns out to be uninformative;
if the indicators differ significantly, this indicates more pronounced information connections [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>To do this, the main diagonal (elements of the form (i,i), corresponding to the self-comparison of
the source with itself, is excluded from each matrix, since on it the similarity is always equal to 1 or
the maximum possible value. Then the variance is calculated by the formula:</p>
      <p>D ( M )=
1 n n</p>
      <p>∑ ∑|mi , j−mu|, i≠ j
n i=1 j=1
(5)
where n is the matrix size, mi,j is the matrix element, mu is the mean value of matrix.</p>
      <p>The interpretation of this indicator (see formula 5) is straightforward: the larger the deviation,
the more clearly the algorithm distinguishes channels by the degree of their similarity or influence
for the given parameters. This approach allows us to objectively compare different combinations of
α, horizon_hours, similarity_threshold and similarity functions not only by the visual appearance of
the matrices, but also by a single quantitative criterion.</p>
      <p>In addition, the calculation time was recorded for each set of parameters, which made it possible
to combine the quality assessment (absolute deviation) with the performance (execution duration).
This is of particular importance for scaling the algorithm to large data sets and determining the
optimal balance between accuracy and speed.</p>
      <p>To test the influence of the horizon_hours parameter, a graph of the dependence of the average
TSI Score on the width of the time horizon was constructed. The graph shows (see Fig. 1) that with
an increase in the range of messages taken into account, the TSI Score value increases almost
linearly, but the increase is insignificant. This behavior indicates that the horizon parameter has
only a moderate effect on the final result, without causing sharp changes in the structure of the
similarity matrices.</p>
      <p>In other words, increasing the horizontal window gives a certain increase in informativeness,
but does not radically change the assessment of the mutual influence between sources. This allows
you to choose a wider horizon (for example, 8 hours) for a more complete coverage of relevant
messages without the risk of significantly distorting the results of the TSI algorithm.</p>
      <p>Analysis of the dependence of the average TSI Score on the value of the parameter α, which
regulates the weight of the time component in the TSI formula, showed a stable trend: with an
increase in α, the average TSI Score value gradually decreases (see Fig. 2). This decrease is not
sharp, but noticeable and reflects a decrease in the influence of the semantic component with an
increase in the weight of the time factor.</p>
      <p>When setting up the TSI algorithm, the similarity threshold parameter (similarity_threshold)
was set at 0.4 as the minimum allowable value, which ensures the basic significance of the
similarity between messages. Further analysis showed that with an increase in this threshold, the
integral TSI Score indicator systematically decreases (see Fig. 3). This trend indicates excessive
selectivity of the algorithm at high thresholds: it selects fewer and fewer pairs of messages that are
considered similar, as a result of which the average TSI Score indicator decreases and distinguishes
channels by the degree of influence worse.</p>
      <p>The results of the algorithm are affected not only by the “quality” of similarity between the
selected messages, but also by the number of pairs included in the calculation. The optimal
threshold value should balance these two factors – to provide a sufficient amount of data for
statistically reliable conclusions, provided that there is an acceptable level of semantic similarity.
Such a balance allows us to achieve both stability and sensitivity of the algorithm to real
information connections between sources.</p>
      <p>Special attention in the study is paid to the comparison of different similarity functions, since
this stage of the TSI algorithm is one of the most resource-intensive. The corresponding graph (see
Fig. X) shows the dependence of the average TSI Score value and the algorithm execution time on
the selected similarity function.</p>
      <p>The obtained results demonstrate a clear picture (see Fig. 4). Jaccard Similarity turned out to be
the fastest among the three implemented functions, but provides the lowest TSI Score values,
which indicates the insufficient sensitivity of the metric to the semantic and stylistic nuances of the
texts. Cosine Similarity gives slightly better similarity indicators compared to Jaccard, but reveals a
high dependence on the α parameters and the similarity threshold: the calculation time can
increase significantly even on the same data. This is explained by the fact that the formation of
vector representations and the calculation of cosine similarity is one of the most computationally
expensive steps of the algorithm.</p>
      <p>The best results were obtained when using Fuzzy Similarity. This metric exhibits different TSI
Score values depending on the α, similarity threshold, and horizon parameters, but when properly
tuned, it provides a combination of high TSI Score values with short execution times. Thus, Fuzzy
Similarity turned out to be the most balanced similarity function for the improved TSI method,
providing the optimal ratio of quality and performance.</p>
      <p>Analysis of the final TSI Score values in different combinations of parameters (see Table 1)
showed that the most effective configuration is the one that uses Fuzzy Similarity (FUSE Similarity)
with an event horizon of 8 hours, a parameter value of α = 0.1 and a similarity threshold of 0.4. As
was demonstrated earlier, increasing the event horizon has only a minor effect on the final result,
but in this combination it provides maximum coverage of relevant messages without
compromising performance.</p>
      <p>It is this configuration that gives the highest integral TSI Score among all tested options and at
the same time remains fast in execution. Thus, the combination of Fuzz Similarity, a horizon of 8
hours, α = 0.1 and a threshold value of 0.4 can be considered the optimal balance between the
quality and performance of the TSI algorithm on the studied sample.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Cascade Influence graphs comparison</title>
      <p>
        Based on the intermediate results of the TSI algorithm, directed graphs of mutual influence of
channels were constructed. In these graphs, nodes correspond to information sources, and edges
reflect the detected direction and intensity of influence between them [
        <xref ref-type="bibr" rid="ref14 ref15 ref16">14–16</xref>
        ]. Influence was
determined based on the similarity of messages: if two channels have similar content, the presence
of a potential information connection is recorded.
      </p>
      <p>Formally, if channel A and channel B have at least two pairs of messages similar according to
the selected similarity function, and at the same time messages from channel A are published
earlier than the corresponding messages from channel B, it is concluded that channel A influences
channel B. In the opposite case, the direction of the edge changes, and the influence is recorded
from channel B to channel A.</p>
      <p>The constructed cascade graphs of mutual influence became the basis for a comparative analysis
of the results of the basic and improved versions of the TSI method. In the previous configuration,
the leading positions in the filtered graphs were occupied mainly by three channels: UaOnlii
(≈39%), voynareal (≈27%) and truexanewsua (≈12%), while other sources showed significantly lower
rates of occurrences at the root.</p>
      <p>
        After applying the improved algorithm with selected parameters, the picture became
significantly more detailed. The absolute valuesof occurrences at the roots of the graphs increased
significantly (for example, for UaOnlii from 58 to 107, for truexanewsua – from 18 to 54), and the
distribution of percentages became more balanced. Although UaOnlii and voynareal remained the
leading sources, the share of other channels – truexanewsua, lachentyt, susilnenews, kievreal1 –
increased noticeably, which indicates a better ability of the method to recognize secondary but
systematic influences [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>The key factor in this improvement was the modernization of the similarity function between
messages. Thanks to more accurate text matching, the algorithm generates significantly more
elements of influence between channels, which directly increases the number of occurrences in the
graph roots. This, in turn, increases the statistical reliability of estimates and the accuracy of
determining real information influence, since a larger number of relevant messages and
interactions are included in the calculation, creating a complete picture of the information network.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions</title>
      <p>In the course of the research, the TSI method, which was previously implemented in our previous
works, was optimized and improved. In this version, the algorithm received an improved function
for calculating similarity between messages, caching of intermediate results, and optimized
computational complexity, which made it possible to increase the speed of operation.</p>
      <p>For each of the main parameters of the algorithm – the weight of the time factor (α), the
message horizon, the similarity threshold, and the message similarity function – a detailed analysis
of their influence on the final result was conducted. The evaluation was carried out on a limited
sample based on the deviation of similarity matrices. This criterion allowed us to objectively
compare different parameter configurations and choose the optimal combination.</p>
      <p>In addition, influence graphs between sources were constructed and analyzed, reflecting the
direction and strength of the potential information influence of channels on each other.
Comparison of the baseline and new results showed that the improved TSI forms significantly
more elements of influence between channels, increases the number of occurrences in the roots of
cascade trees, and provides a more accurate assessment of leading channels in the studied network.
As a result, the applied approach allowed not only to increase the performance and
informativeness of the method, but also to obtain a more detailed and reliable picture of
informational relationships between sources, which opens up prospects for further research and
expansion of the method to larger data samples.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT-5 and Grammarly in order to:
Grammar and spelling check and as a smart Search Engine to find related works based on context
of conversation. After using these tools and services, the authors reviewed and edited the content
as needed and takes full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Telegram</surname>
            <given-names>APIs</given-names>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://core.telegram.org/api
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cresci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Di</given-names>
            <surname>Pietro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>A survey on computational propaganda detection</article-title>
          ,
          <source>in: 29th International Joint Conference on Artificial Intelligence (IJCAI</source>
          <year>2020</year>
          ),
          <year>2020</year>
          . https://doi.org/10.24963/ijcai.
          <year>2020</year>
          /672
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Johri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Khatri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Al-Taani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sabharwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suvanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Natural language processing: history, evolution, applications and future work</article-title>
          ,
          <source>in: 3rd Int. Conf. on Computing Informatics and Networks</source>
          ,
          <string-name>
            <surname>LNNS</surname>
          </string-name>
          , Springer,
          <year>2021</year>
          ,
          <fpage>365</fpage>
          -
          <lpage>375</lpage>
          . https://doi.org/10.1007/
          <fpage>978</fpage>
          -981-15- 9712-1_
          <fpage>31</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Guille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hacid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Favre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Zighed</surname>
          </string-name>
          ,
          <article-title>Information diffusion in online social networks: a survey</article-title>
          ,
          <source>ACM SIGMOD Record</source>
          <volume>42</volume>
          (
          <year>2013</year>
          )
          <fpage>17</fpage>
          -
          <lpage>28</lpage>
          . https://doi.org/10.1145/2503792.2503797
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marchese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Text clustering with seeds affinity propagation</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>23</volume>
          (
          <year>2011</year>
          )
          <fpage>627</fpage>
          -
          <lpage>637</lpage>
          . https://doi.org/10.1109/TKDE.
          <year>2010</year>
          .144
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Janani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vijayarani</surname>
          </string-name>
          ,
          <article-title>Text document clustering using spectral clustering with PSO</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>134</volume>
          (
          <year>2019</year>
          )
          <fpage>192</fpage>
          -
          <lpage>200</lpage>
          . https://doi.org/10.1016/j.eswa.
          <year>2019</year>
          .
          <volume>05</volume>
          .030
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Dogra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Verma</surname>
          </string-name>
          , Kavita,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shafi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Ijaz</surname>
          </string-name>
          ,
          <article-title>A complete process of text classification using state-of-the-art NLP models</article-title>
          ,
          <source>Computational Intelligence and Neuroscience</source>
          <year>2022</year>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          . https://doi.org/10.1155/
          <year>2022</year>
          /1883698
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Daelemans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Hoste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>De Meulder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Naudts</surname>
          </string-name>
          ,
          <article-title>Combined optimization of feature selection and algorithm parameters in machine learning of language</article-title>
          ,
          <source>in: ECML</source>
          <year>2003</year>
          , LNCS 2837, Springer,
          <year>2003</year>
          . https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -39857-
          <issue>8</issue>
          _
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Magara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O.</given-names>
            <surname>Ojo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zuva</surname>
          </string-name>
          ,
          <article-title>A comparative analysis of text similarity measures in recommender systems</article-title>
          ,
          <source>in: ICTAS</source>
          <year>2018</year>
          , IEEE,
          <year>2018</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . https://doi.org/10.1109/ICTAS.
          <year>2018</year>
          .8368766
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Hong</surname>
          </string-name>
          , W. Kim,
          <article-title>Combining cosine similarity with classifier for text classification</article-title>
          ,
          <source>Applied Artificial Intelligence</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>396</fpage>
          -
          <lpage>411</lpage>
          . https://doi.org/10.1080/08839514.
          <year>2020</year>
          .1723868
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zahrotun</surname>
          </string-name>
          , Comparison of Jaccard,
          <article-title>Cosine and combined similarity in SNN clustering</article-title>
          ,
          <source>ComEngApp Journal 5</source>
          (
          <year>2016</year>
          ). https://doi.org/10.18495/COMENGAPP.V5I1.160
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalluru</surname>
          </string-name>
          ,
          <article-title>Enhancing data accuracy and efficiency using fuzzy matching</article-title>
          ,
          <source>International Journal of Science and Research</source>
          (
          <year>2023</year>
          ). https://doi.org/10.21275/sr23805184140
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shalileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mirkin</surname>
          </string-name>
          ,
          <article-title>Least-squares community extraction in feature-rich networks</article-title>
          ,
          <source>PLoS ONE 16</source>
          (
          <year>2021</year>
          ). https://doi.org/10.1371/journal.pone.0254377
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          et al.,
          <article-title>Effects of time-dependent diffusion on rumor spreading in social networks</article-title>
          ,
          <source>Physica A 452</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . https://doi.org/10.1016/j.physleta.
          <year>2016</year>
          .
          <volume>04</volume>
          .025
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lytvyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lozynska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Uhryn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vovk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ushenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Decision support in GIS using swarm intelligence</article-title>
          ,
          <source>Modern Education and Computer Science</source>
          <volume>2</volume>
          (
          <year>2023</year>
          )
          <fpage>62</fpage>
          -
          <lpage>72</lpage>
          . https://doi.org/10.5815/ijmecs.
          <year>2023</year>
          .
          <volume>02</volume>
          .06
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vladov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chyrun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Muzychuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lytvyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rekunenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Basko</surname>
          </string-name>
          ,
          <article-title>Intelligent Method for Generating Criminal Community Influence Risk Parameters Using Neural Networks and Regional Economic Analysis</article-title>
          ,
          <source>Algorithms</source>
          <volume>18</volume>
          :8 (
          <year>2025</year>
          )
          <article-title>523</article-title>
          . https://doi.org/10.3390/a18080523
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vladov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sokurenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Muzychuk</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Chyrun,</surname>
          </string-name>
          <article-title>The Intelligent Data Measurement System Using Neural Network Technologies and Fuzzy Logic Under Operating Implementation Conditions</article-title>
          ,
          <source>Big Data and Cognitive Computing</source>
          <volume>8</volume>
          :
          <issue>12</issue>
          (
          <year>2024</year>
          )
          <article-title>189</article-title>
          . https://doi.org/10.3390/bdcc8120189
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>