<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>PLoS ONE 13(5): e0196533 (2018).
URL: https://doi.org/10.1371/journal.pone.0196533.
[17] W. Zhou</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Clustering in the Recommendation System of Social Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yelyzaveta Meleshko</string-name>
          <email>elismeleshko@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mykola Yakymenko</string-name>
          <email>m.yakymenko@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serhii Semenov</string-name>
          <email>s_semenov@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Central Ukrainian National Technical University</institution>
          ,
          <addr-line>8, Universytetskyi prosp., Kropyvnytskyi, 25006</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Technical University "Kharkiv Polytechnic Institute"</institution>
          ,
          <addr-line>2, Kyrpychova str., Kharkiv, 61002</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2005</year>
      </pub-date>
      <volume>2453</volume>
      <fpage>67</fpage>
      <lpage>74</lpage>
      <abstract>
        <p>This paper proposes a method for detecting a network of bots in the recommendation system based on graph clustering and analysis of user actions. This method is proposed to be used if there are signs of an attack on the recommendation system and a set of probable targets of the attack is revealed. A series of experiments was carried out to test the effectiveness of the proposed method. The input data for the experiments were generated in the developed software simulation model of users and items of the recommendation system that made it possible to simulate the influence of external destabilizing factors on the system such as information attacks. A subsystem of information security of the recommendation system has been developed. Such subsystem consists of a method of detecting an information attack on the recommendation system based on trend analysis in item ratings and a method of detecting bot networks in the recommendation system based on graph clustering and user action analysis. Also, appropriate algorithms have been developed. recommendation systems, data analysis, clustering, information attacks, bot networks, graphs, algorithms, computer simulation, social networks, websites COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent Systems, April 22-23, 2021, Kharkiv, Ukraine ORCID: 0000-0001-8791-0063 (Ye. Meleshko); 0000-0003-3290-6088 (M. Yakymenko); 0000-0003-4472-9234 (S. Semenov)</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Most often information attacks on recommendation systems are carried out for marketing purposes
in order to increase the ratings of the attacking party's products or decrease the ratings of competitors'
products [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6">1-6</xref>
        ]. Also attacks on recommendation systems
may also be aimed at propagating
informational influences during informational conflicts, e.g., for political purposes.
Various
information influences are often carried out through social networks [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref7 ref8 ref9">7-12</xref>
        ]. Recommendation systems
as their components have become one of the targets for information attacks in order to implement
such influences [
        <xref ref-type="bibr" rid="ref1 ref13 ref2 ref3 ref4 ref5 ref6">1-6, 13-15</xref>
        ]. By successfully attacking a social network recommendation system or
news aggregator one can change the content and order in which items of news feed are displayed to
system users. It can be used not only for marketing but also for political or fraudulent purposes. To
implement attacks on recommendation systems bot networks are used, since only a certain set of
profiles in the system can influence recommendations formation by their joint actions [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6">1-6, 14, 15</xref>
        ].
1.1.
      </p>
      <p>
        Mostly in existing studies [
        <xref ref-type="bibr" rid="ref4">4, 14-17</xref>
        ] it is proposed to consider the detection of an information
attack on the recommendation system as identical to the detection of bot profiles, and for detecting a
bot network, all user profiles are clustered based on their statistics by non-hierarchical methods (for
example, k-means clustering). After such clustering bots and authentic users get into different
      </p>
      <p>2021 Copyright for this paper by its authors.
clusters.</p>
      <p>Since the detection of bot profiles is quite a resource-intensive task, this paper proposes to split the
task of protecting the recommendation system from information attacks into two parts: 1) detection of
attack signs and 2) detection and neutralization of bot profiles.</p>
      <p>Attack detection can be a less resource-intensive task if it will be realized as tracking the dynamics
of ratings of system items (all ratings or only critical ones from the point of view of information
security). For example, if the ratings of items to protect start to change rapidly, and new ratings that
lead to changes in ratings do not match previous average ratings of these items, you should check for
bots among users who have started making such ratings. This approach will reduce the number of user
profile checks. First, since they will need to be checked only if an attack is suspected. Secondly, because
it will be necessary to check the profiles of not all users, but only those who carry out suspicious actions.</p>
      <p>If in the system items with an abnormal change in rating trends are detected (that could
hypothetically be the result of a botnet attack), the next logical step is to try to identify user profiles that
influenced this change and try to find out if they are connected to a coordinated network or networks.</p>
      <p>It is natural that changes in trends in the ratings of items are influenced by all users who gave them
ratings equal to the target – high if the item's ratings increased or low if the item's ratings decreased.
The set of such users is easily identified by simple queries to the recommendation system database.
But it is much more difficult to determine which user profiles are really part of the botnet and perform
coordinated actions with other members of the botnet and who simply gave a rating that corresponded
to their preferences.</p>
      <p>Thus, the purpose of this work is to develop а method of detecting bot networks in the
recommendation system of social network, which could be used only after detecting signs of an attack
to check only those user profiles who interacted with possible attack objects.</p>
      <p>
        The problem of botnet revealing among the profiles of system users can be reduced to the problem
of finding a subgraph in the social graph of the recommendation system. The vertices in such a
subgraph will be user profiles linked by some joint actions that influenced changes in rating trends of
all or most items probable targets of the information attack [
        <xref ref-type="bibr" rid="ref4">4, 18</xref>
        ]. Therefore, to develop a method for
detecting a bot network we decided to use methods of graph clustering.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Development of a method for detecting bot networks in the</title>
      <p>recommendation system</p>
    </sec>
    <sec id="sec-3">
      <title>2.1. Detection of signs of information attacks on the recommendation system</title>
      <p>Since it is proposed in this paper to split the problem of protecting the social network
recommendation system from information attacks into two parts: detecting signs of attack, searching
for bot profiles, we will first consider how to solve the first problem before developing a method for
solving the second problem.</p>
      <p>In [19] a method to determine the set of probable goals of the information attack of the bot
network by analyzing the trends of ratings of items of the recommendation system was proposed.
Let's consider the basic principles of this method.</p>
      <p>The set of possible targets of the attack will be denoted as G.</p>
      <p>The presence of an information attack on the recommendation system causes a change in the
ratings of a group of items (increases or decreases them), but the change in the ratings of items is not
a sufficient reason to think that there is an attack, since ratings may change due to the actions of
authentic users. Therefore, a number of additional features which may indicate an information attack
have been selected. A set of indicators to determine the presence or absence of an information attack
on the item of the recommendation system was proposed as follows:</p>
      <p>, = { ,  ,   ,   ,   ,   ,   }, (1)
where tr – is a trend of rating dynamics of item i that can take the following values {–1, 0, 1} –
respectively "rating downward trend", "no change" and "upward trend";  – is a prediction for the trend
of the rating dynamics of item i, for example, based on the Hurst exponent [20];   – is a variance of
ratings of item  ;   – is a variance of time of grading for item  ;   – is a number of ratings of item  at


 
the studied time interval;</p>
      <p>– is a number of target ratings for item  at the studied time interval;
– is a number of gettings of item  into user recommendation lists at the studied time interval.</p>
      <p>We will assume that the attack on the recommendation system occurs when the ratings of one o r
more items of the system are purposefully changed by the joint actions of a group of system user
profiles. Herewith the size of damage from the attack does not always depend on the number of items.
Successful change of the ratings of even one item can have great consequences if the item is
important and it is, for example, in the social, political or medical sector etc.</p>
      <p>It is proposed to determine the presence and type of attack according to the following rules:
Rule 1. If the item has any 5 signs from the data: the trend of increasing the rating    = 1,
&gt; 0.73,  
≤   ,
,   ≤   ,
,  
&gt;   ,
&gt;   ,
,  
&gt;   ,
, then we
believe that there is a high probability of a rating upgrade attack for this item.</p>
      <p>Rule 2. If the item has any 5 signs from the data: the trend of decreasing the rating    = −1,
&gt; 0.73,  _
≤   ,
,  
≤   ,
,  
&gt;   ,
&gt;   ,
,  
&lt;   ,
, then we
believe that there is a high probability of a rating downgrade attack for this item.</p>
      <p>Once the attack on the recommendation system is detected and the set of probable targets of bots 
is created, it will be logical to investigate the profiles of all users who influenced the change of item
rating trends from this set  . And in this work, it is precisely such actions that are proposed to be done
,  
,  
for the search of bot profiles.
2.2.</p>
    </sec>
    <sec id="sec-4">
      <title>Identification of bot profiles in the recommendation system</title>
      <p>To solve the problem of identification of bot profiles in this paper a method based on graph
clustering and analysis of user actions was developed using the coefficients of "distrust". The method
consists of the following stages:
target estimates   for items from the set  .
by the next formula:</p>
      <p>Stage 1. Create a set of suspicious user profiles  , in which we place the profiles that gave the
Stage 2. Assign to each user from the set  label :suspicious and coefficient of distrust calculated
where    , , – is a presense of target rating   from user  for item  belonging to the set  of probable
targets of the attack, takes values 1, if target rating exists and 0 – in the absence of such a rating from
user  for item  ;   – is a number of items in the set  of probable targets of the attack.</p>
      <p>Stage 3. For each pair of users  1 and  2 from the set  , where   , 1 ≥  and   , 2 ≥  , create an
edge between them with a label :BotNet.</p>
      <p>Stage 4. Perform graph clustering for a subgraph containing vertices with labels :User and
:suspicious and edges with label :BotNet. The results of such clustering will be as follows – all bots
will get into one large cluster, if there is one botnet, or to several large clusters – if there are several
botnets; authentic users will get into different clusters, each of these clusters will contain one user or a
small number of users. It is also possible that a certain number of authentic users will get into the
cluster with bots, or a group of similar and active authentic users will create a separate cluster if their
actions shift the ratings of some items.</p>
      <sec id="sec-4-1">
        <title>Stage 5. Define the largest clusters consisting of (</title>
        <p>–  ) users, where  
– is a minimum
number of users that may affect the performance of the recommendation system (depends on the
parameters of a particular system),  – is an approximate value of the error when dividing user
profiles into clusters. We consider such cluster (or clusters) as a possible bot network (bot networks).
For users who do not get into these clusters, remove the edges with the label :BotNet. Users who got
to the subgraph BotNet needed to be further checked analyzing the statistical characteristics of their
profiles, for example, using the method proposed in the previous section using neural networks [21].
Also for additional check of profiles from the subgraph BotNet one can search for certain features
characteristic of bots, for example, poorly completed personal data or extremely high activity (the
characteristics of bots depend on the specific system and can become known in the process of
  , = ∑  
 ∈
   , , ,
 
(2)
collecting statistics during its operation). One of the common features of bots for many systems can
be: the difference between the values of the variance of ratings and the variance of time intervals
between ratings in the profiles of bots from the average values of the corresponding variances in the
profiles of system users. For a particular system, the features of bots can be: features of profile
registration, features of filling the profile with personal information, style of writing and content of
comments, user friends list etc. After checking the statistics of individual user profiles that got into the
subgraph BotNet, it have to be corrected, removing from it users identified by statistics as authentic. If
there is a cluster in which all users are recognized as authentic – it should cease to be considered a bot
network.</p>
        <p>Stage 6. Correct the set  after analyzing the user ratings from the subgraph BotNet. It is necessary
to check which items users from the botnet have coordinately targeted. Remove from  items that did
not receive at all or received a small percentage of target ratings from users identified as bots. Add to
the set  items that have received target scores from all bots (or a large percentage of bots).</p>
        <p>The research to the most well-known and frequently used methods of graph clustering was
conducted [22-27]. Consider the methods of graph clustering that can be used in the proposed method
of detecting a network of bots and the basic principles of their operation.</p>
        <p>Louvain (or Multilevel) method – graph clustering method based on multilevel optimization of
the modularity function [23, 28]. Modularity is a numerical characteristic that describes the strength
of division of a graph into modules, is one measure of the structure of networks or graphs [23, 25, 26].
To assess the modularity the next formula can be used:</p>
        <p>1
2  
   
2 

=
∑   ( 
−
)  (  ,   ),
(3)
where   – is a number of edges in the graph;  – is an adjacency matrix of the graph;   – is a
number of edges adjacent to the vertex  ;   – is an number of edges adjacent to the vertex  ;
 (  ,   ) – is a delta function equal to one, if  
=</p>
        <p>and zero otherwise.</p>
        <p>This value is equal to the difference between the number of edges within the cluster during the
current splitting and the number of edges if they were randomly generated [26].</p>
        <p>The value of modularity shows the severity of clusters, it will be:



equal to one for a complete graph in which all vertices are placed in one cluster;
equal to zero for division into clusters, in which each vertex is in a separate cluster;
for unsuccessful partitions modularity can take a negative value.</p>
        <p>In essence, the value of modularity can be used to assess the quality of splitting graphs into
clusters. Qualitative partitioning is characterized by the fact that the number of internal links within
each cluster must be greater than the number of its external links.</p>
        <p>The value of modularity characterizes not how
much for this partition intra-cluster bonds are
denser than inter-cluster, but how much they are denser compared to some initial density. Therefore,
there is a comparison with the "null hypothesis", which is that the arcs are distributed randomly, that
is, there are no patterns in the distribution of the density of arcs within the graph.</p>
        <p>The principle of graph clustering algorithms based on modularity optimization is that at each step of
the algorithm each cluster vertex is somehow matched to a cluster, the modularity value is calculated
and the graph vertices are redistributed between clusters so as to increase the modularity value. The
operation of such algorithms stops when it is no longer possible to improve the value of modularity.</p>
        <p>The Louvain method consists of two parts. The first part is a greedy optimization of modularity.
The second part of the method is as follows: a new graph is created with metapeaks in the form of
found clusters and edges with the total weight of all edges going from one cluster to another (loops
with total weights of connections inside the cluster are also created). Such a graph is called a metagraph.
The algorithm is restarted on a new graph. This is one of the most well-known methods due to its speed.</p>
        <p>LabelPropagation method – graph clustering method based on graph labeling. It divides the
graph into clusters as follows: each vertex in the graph belongs to the cluster to which most of its
neighbors belong, if there are several such clusters, then one of them is chosen by chance [23, 26, 28].
This method is based on the idea that the vertex belongs to the cluster to which the largest number of
its neighboring vertices belongs. Consider the principle of operation of this method. At the initial
point in time, all vertices are assigned a separate cluster. Each vertex receives a label or color of the
corresponding cluster. Then, for each vertex, it is checked which clusters its neighbors belong to and
the cluster membership is redistributed. Due to coincidences, it is important to change the order of
vertex traversal at each iteration. The algorithm ends when there is nothing to change – all vertices
belong to the same clusters as most of their neighbors. To improve the results, one can use the
following trick – run the algorithm several times, save the result of its work each time and choose the
best option for splitting the graph. The main advantage of this algorithm is almost linear complexity.
The disadvantage of the algorithm is that noisy graphs often combine all vertices into one cluster or a
small number of clusters.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>2.3. Experiments to test the effectiveness of the proposed method on a software model</title>
      <p>A series of experiments was carried out to test the developed method of detecting a network of bots.</p>
      <p>Since there is no information on bot profiles in the available open datasets for testing
recommendation systems, it is necessary to create a software simulation model of users and
recommendation system items to generate datasets that will also contain bot profiles.</p>
    </sec>
    <sec id="sec-6">
      <title>2.3.1. Software simulation model of users and items of the recommendation system</title>
      <p>A software simulation model of users and items of the recommendation system with the ability to
simulate information attacks has been developed. Let's consider the basic principles of work of the
developed software simulation model of users and items of the recommendation system.</p>
      <p>The main type of attacks on recommendation systems are profile injection attacks, which involve
the creation of a number of bot profiles (bot networks) that, by coordinated actions, change the ratings
and frequency of getting target items into recommendation lists.</p>
      <p>A study of existing models of attacks on recommendation systems was carried out in the paper.
The bot profile can contain the following types of ratings:
 Ratings for items from the set   to simulate the actions of real users. The attacker doesn't try
to change these ratings, but instead tries to find values for them that are as close as possible to the
real target group of users he seeks to influence.
 Ratings for items from the set   , these are the maximum (or close to them) ones in the
system for target items for which the attacker seeks to increase the rating.
 Ratings for items from the set   , are the minimum (or close to them) ones in the system for
target items for which the attacker seeks to decrease the rating.</p>
      <sec id="sec-6-1">
        <title>The number of target items in the bot can vary from 1 to  and be contained only in the set   or only in the set   , or both, and the number of items to fill the profile – from 0 to  .</title>
        <p>Random Attack</p>
        <p>In bot profiles the set   will be filled with ratings for randomly selected items. Ratings for selected
items will also be selected randomly, but in such way that they are close to the global average score in
the system, for example, a normal distribution with a mathematical expectation equal to the global
average will be used. The target item will be given the maximal   or minimal rating   ,
depending on the targets of the attack. The knowledge and effort required to carry out such an attack
is quite minimal – the global average score in many systems can be easily learned directly or through
indirect data. This attack is not very effective.</p>
        <p>Average Attack</p>
      </sec>
      <sec id="sec-6-2">
        <title>Uses the individual averages of each item's ratings to create the set   . More information needs to</title>
        <p>be gathered for this attack. However, an average attack can be successful even with a small set of
elements in   , that reduces the amount of information needed to collect. But the cost of such a reduction
in the required data will be a large number of profiles with almost the same set of rated elements that
will, of course, be easy to detect. This attack is more effective than random. But it is practically
ineffective for collaborative filtering algorithms based on the neighborhood model of item-based type.</p>
        <p>The average attack requires a relatively large amount of knowledge about the statistics of the
actions of real users in the system. Reasonable defense of the recommendation system from such
attacks will make it difficult for the attacker to collect the necessary data. To bypass such defense,
other attacks are used, for which the requirements for the amount of knowledge are much lower.</p>
        <p>Consider existing attacks that require less knowledge than the average attack, but work more
effectively than a random attack.</p>
        <p>The purpose of this attack is to associate the attacked item with a small number of items that are
often rated by users (let's call them widely known). An attacker creates profiles of bots that contain a
set   of ratings of widely known items. Such profiles are more likely to look like a large number of
users, as well-known items are those that are rated by many users. Data for such an attack is quite
easy to obtain. So, among the well-known items, several are chosen at random. These items are given
maximum ratings along with the target item. Some of the items in the set   can be randomly rated, for
example like in a random attack, to vary bot profiles. This is a fairly effective attack but like the
average attack becomes ineffective when used against collaborative filtering based on the item-based
neighborhood model.</p>
        <p>The basic idea of this attack is to change the rating of the item in the target group of users with
known or easily predictable preferences. The target item will be rated only in a certain segment of users,
so that it gets recommendations only to them. Otherwise, if the target item gets in recommendations for
users from other segments, it may start receiving low ratings from them and the number of such low
ratings will be greater than artificially inflated ones. To carry out such an attack, an attacker need to
find real users who belong to the target segment and collect data on the ratings that they usually give
to system items. As with the bandwagon attack, it is usually determined which items in the target
segment are widely known. These items are assigned the maximal rating together with the target item.</p>
      </sec>
      <sec id="sec-6-3">
        <title>To ensure maximum effect from the attack, some items for the set   are randomly selected and</title>
        <p>receive minimal ratings, which allows the attacker to make the bot profiles various. This attack is
effective against collaborative filtering algorithms based on the item-based neighborhood model.</p>
        <p>It should be noted that all of the attack models discussed above can be used to decrease item
ratings, but there are specialized attacks that work better than others exactly to decrease ratings.</p>
        <p>Consider attack models designed to decrease ratings of recommendation system items.
Love/Hate Attack</p>
        <p>This attack is very simple – no knowledge to be required. The target item is given a minimal rating

, and the items for the set   are chosen randomly and receive maximal ratings  
. Despite its
extreme simplicity, this is one of the most effective downgrade attacks against collaborative filtering
algorithms based on the user-based neighborhood model.</p>
        <p>Reverse Bandwagon Attack</p>
        <p>This is variation of the above-described bandwagon attack that selects widely known items for  
that are rated low by the vast majority of users. These items are rated low in bot profiles, and low
ratings are assigned to the target item. Thus the target item begins to be associated with items that are
not liked by a large number of users, and this increases the probability that low ratings will be
predicted for the item and it will not be included in the lists of recommendations. Although this attack
is not as effective as the average attack with a lot of knowledge for user-based systems, it is a very
effective attack to downgrade against item-based systems.</p>
        <p>Low-level attacks use widely known items to fill a bot's profile with ratings for them. In this way,
an attacker can create a profile similar to the average user by examining the ratings of only widely
known items.
information to attack.</p>
        <p>If an attacker knows exactly which algorithm the referral system uses, he can gather more
Thus, attacks can be classified into:

</p>
        <p>Attacks with little knowledge – this type of attack does not require detailed knowledge of the
distribution of scores in the system. It requires system-independent knowledge that can be easily
obtained through public sources of information.</p>
        <p>Attacks with a lot of knowledge – an attacker needs to have as much knowledge as possible
about the system's algorithms and the distribution of ratings of system items. For example, some
attacks require the attacker to know the mean and standard deviation for each item in the system.
An example of an attack with a lot of knowledge is a popular attack.</p>
        <p>Popular Attack</p>
        <p>Assume that the system uses a standard user-based collaborative filtering algorithm, where
similarity between users is determined by Pearson correlation. As in the bandwagon attack, the set  
is filled using well-known system items. However, this does not guarantee a high similarity between
the bot profile and the real profiles. Therefore, the popular attack uses the average values of the
ratings of selected well-known items to fill the set   . And in order to vary the profile data assign to
some randomly selected items in   ratings equal to (  + 1) or   , depending on whether the
average rating for the item is higher or lower. This strategy will lead to positive correlations between
bot profiles and authentic profiles. It doesn't take a lot of knowledge to identify well-known items, but
a lot of information needs to be gathered to determine the average ratings of selected items. A popular
attack can also be easily tuned for downgrade attacks. This attack can be detected by comparing
profiles in the system – the bot profiles of one bot network will be very similar.</p>
        <p>Probe Attack</p>
        <p>The more similar bot ratings are to those of real users, the more difficult it is to recognize bot
profiles. Knowledge of the real preferences of different segments of users can be obtained from the
system itself through a probe attack. To carry out this attack the attacker creates a seed profile and
then uses it to obtain recommendations from the system. These recommendations are created on the
basis of system information about real users, so the use of the received recommendations will allow
the attacker to create bot profiles more similar to real users. The attacker can probe a small number of
users to then affect a small group, as in a segmental attack, or a large part – to obtain information, for
example, for an average attack. An attacker needs to use only a small number of seed profiles in order
for the recommendation system to provide him with the necessary information in the form of
recommendations.</p>
        <p>Our studies of models of information attacks on recommendation systems have shown that the
easiest to implement and the least resource-intensive attacks are random and average ones, and the
most effective and inconspicuous (although quite resource-intensive) attack is a popular attack. Other
studied models of attacks in fact are their modifications, for example, designed to attack only a certain
segment (segment attack), or to attack only to lower ratings (love/hate attack), or to reduce resource
consumption by reducing the inconspicuous of the botnet (bandwagon attack, love/hate attack).
Therefore, in the software simulation model of users and items of the recommendation network, we
decided to model botnets based on three main models of attacks – random, average and popular attack.</p>
        <p>Models of profile injection attacks on</p>
        <p>recommendation systems
Random Attack
Probe Attack
Average Attack</p>
        <p>Segment Attack</p>
        <p>Popular Attack
Bandwagon Attack</p>
        <p>Love/Hate Attack</p>
        <p>The proposed method of software simulation of users and items of the recommendation system
consists of the following parts:
 Generation of the structure of the social graph of the recommendation system [29].
 Simulation of the behavior of users and bots of the recommendation system.</p>
        <p>The generation of the structure of the social graph of the recommendation system was carried out
on the basis of theory complex networks [22, 30-32] and the Barabashi-Albert (BA) model [22, 30],
which creates scale-free networks based on 2 conditions:</p>
      </sec>
      <sec id="sec-6-4">
        <title>Growth. Starting with a small number of nodes  0, on each new time iteration one new node is</title>
        <p>added with  links (where  ≤  0) connecting a new node to  different existing nodes.</p>
        <p>Preferential attachment. Probability  that the new node connect to some existing node  is the
higher the more connections the  -th node has:
  =   , (4)</p>
        <p>∑  
where   – is a degree of  -th node, and the denominator calculates the sum of all degrees of existing
nodes in the network.</p>
        <p>To generate the recommendation system the following subgraphs are used: 1) Users-Friends,
Users-Followers, Posts-Published, Posts-Viewed and Posts-Liked – are created by the generator of the
social network graph, 2) Users-Similarity, Posts-Similarity and Posts-Recommended – are created by
the recommendation system.</p>
        <p>Stages of the developed method:
Stage 1. A non-oriented subgraph Users-Friends is generated based on BA.</p>
        <p>Stage 2. A non-oriented subgraph Users-Followers is generated based on BA.</p>
        <p>Stage 3. Subgraphs Users-Friends and Users-Followers are combined into one graph.</p>
        <p>Stage 4.  An oriented subgraph Posts-Published is generated based on modified BA. In the first
iteration  0 users are randomly selected to "create"  0 posts. Probability post publication by some
user:</p>
        <p>1 +  2 +  3
  = , (5)</p>
        <p>∑   ( 1 +  2 +  3 )
where  1 – is a number of friends of  -th node,  2 – is a number of subscribers of the  -th node,
 3 – is a number of posts of the  -th node, and the denominator calculates the sum of all these values
for all existing nodes in the network.</p>
        <p>Stage 5. Subgraph Posts-Published is joined to the general graph.</p>
        <p>Stage 6. An oriented subgraph Posts-Viewed is generated based on BA. Probability the post to be
reviewed:</p>
        <p>1 +  2 +  3
  = , (6)</p>
        <p>∑   ( 1 +  2 +  3 )
where  1 – is a number of friends of the author of the  -th post,  2 – is a number of subscribers to
the author of the  -th post,  3 – is a number of views of the  -th post, and the denominator calculates
the sum of all these values for all existing nodes in the network.</p>
        <p>Stage 7. Subgraph Posts-Viewed is joined to the general graph.</p>
        <p>Stage 8. Subgraph Posts-Liked is generated. Probability of getting a like:</p>
        <p>1 +  2 +  3 +  4
  = , (7)</p>
        <p>∑   ( 1 +  2 +  3 +  4 )
where  1 – is a number of friends of the author of the  -th post,  2 – is a number of subscribers to the
author of the  -th post,  3 – is a number of views of the  -th post,  4 – is a number of likes of the  -th
post, and the denominator calculates the sum of all these values for all existing nodes in the network.</p>
        <p>Stage 9. Subgraph Posts-Liked is joined to the general graph.</p>
        <p>Stage 10. Subgraphs Users-Similarity, Posts-Similarity and Posts-Recommended are generated by
the algorithms of the selected recommendation system.</p>
        <p>Studies of the parameters of the generated social graphs [29] (in particular such as density,
diameter of a network, clustering coefficient et al.) showed their correspondence to the properties of
real social networks.</p>
        <p>Modeling of user behavior was based on the generation of a dynamic graph Users-Ratings-Items.
Stages of the developed method:
Stage 1.  Initialization of user parameters and system items.</p>
        <p>Stage 2.  Creating a "Seed" of social graph.</p>
        <p>Stage 3.  Simulation of time iterations. At each time iteration, a number of users, items, and bots
join the model. Users and bots rate items.</p>
        <p>Stage 4.  Stopping the model and saving the resulting data set to a file.</p>
        <p>Generating ratings for regular users:
  ,
=</p>
        <p>∑
 =0   |  , −   , |,</p>
        <p>,
= Ψ(5  ,</p>
        <p>+   +   ),
where   ,
– is a distance between user  and item 
in the multidimensional space of hidden factors;
 – is a number of hidden factors;   , – is a i-th hidden factor of user  ;   , – is a  -th hidden factor of
item  ;</p>
        <p>– is a user offset in items' evaluation (level of requirements to contenct);   – is a shift of the
item in obtaining ratings (content quality level); Ψ() – is an function that converts the resulting fractional
number into a discrete number from a set of ratings [0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0].</p>
        <p>Among the existing attack models for software simulation we chose random, average and popular
attacks – since these models are the most commonly used, universal and illustrate different attack
strategies.</p>
        <p>Generating a ratings for bots:
  ,
= {
attackPattern(), if item – "ordinary"
5.0, if item – "target"
,
where attackPattern() – is a function generating ratings for non-target items according to the type of
attack model selected.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>2.3.2. The results of experiments to verify the effectiveness of the proposed method of detecting bot networks</title>
      <p>
</p>
      <p>
        Data sets for experiments were generated in the developed software simulation model of the
recommendation system. The format and statistical features of the data were generated as close as
possible to the corresponding characteristics of MovieLens Datasets [33]. Information attacks were
modeled using random, average, and popular attack models [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2-5</xref>
        ]. The Label Propagation algorithm
was used as a graph clustering algorithm.
      </p>
      <p>In the series of experiments, 5%, 10%, 20% and 30% of bots were generated in different series, all
other users of the system are authentic. In different experiments bots had different numbers of targets to
attack: 1, 5, 10, 15, 20, 25, 30, 35, 40 and 45 targets. Also 20% of profiles with a high level of activity in
the recommendation system were generated among authentic users. Such users gave significantly more
ratings than the average user. System items were generated to belong to one of the 19 clusters.</p>
      <p>The bot search algorithm for the studied recommendation system had the following parameters:</p>
      <p>= 0.05 – that is, the user has set target ratings of at least 5% of the items that are identified
by the information security subsystem as probable targets of the attack. The threshold is small because
it is not known how accurately the targets of the attack are recognized, there may be many items with
the ratings that have changed naturally.</p>
      <p>The elimination of user profiles from the BotNet subgraph is based on the values of the
variance of ratings and the variance of time intervals between ratings. Authentic users who mistakenly
get into the BotNet subgraph are users whose profile data satisfy the following rule:
|  –   ,
| &lt; 0.15 AND (  ,
&gt; 72 AND   ,
&lt; 600),
where   – is a variance of ratings in the user profile,   ,
– is a truncated mean variance of ratings
in the system user profile (30% of the extreme values were truncated, i.e., value equal to the
maximum possible percentage of bots in the system in question),   , – is a variance of time intervals
between rates. The variance of ratings in the profiles of authentic users is not significantly different
from the average variance (the attacker of system can not know this value, he can only estimate it, so
can not accurately reproduce this characteristic when creating bot profiles). And checking the
variance of time intervals between ratings is as follows – users who have too identical or too different
time intervals between ratings should be considered suspicious, while the variance of time intervals
between ratings of authentic users is usually in a certain range.
(8)
(9)
(10)</p>
      <p>In the table 1 the results of a series of experiments to verify the accuracy of the developed method
of identification of bot profiles are show. The following notations of attack models are used:
RA – random attack, AA – average attack, PA – popular attack.</p>
      <p>,
recall =

(11)
(12)
(13)</p>
      <p>RMSE of
classification of
users on bots and</p>
      <p>authentic
To assess the quality of the developed method the following metrics were selected.</p>
      <p>Precision – a measure that characterizes how many positive triggers of the information security
subsystem of bot detection are correct. It was calculated by the formula:
where 
– is a correctly detected bot;</p>
      <p>– is an authentic user incorrectly identified as a bot.</p>
      <p>Recall (or Sensitivity) – a measure that characterizes the ability of the information security
subsystem to correctly detect the profiles of bots without taking into account false positives. It was
determined by the formula:</p>
      <p>As shown by a series of experiments from Table 1, the precision of bot detection by the developed
method in average is 0.72 for a random attack, 0.81 for an average attack, and 0.71 for a popular</p>
      <p>The accuracy of bot recognition by the developed method for the popular attack model was also
investigated in more detail (Table 2).</p>
      <p>The average values of the accuracy of the developed method for different numbers of information
attack targets in bots
where</p>
      <p>– is a correctly detected bot; 
RMSE – root mean square error:</p>
      <p>– is a bot incorrectly identified as an authentic user.</p>
      <p>The experiments have shown that the detection precision of bots by the method developed for a
popular attack in avereage is 0.71, completeness is 0.82, and RMSE is 0.22. The worst results were
obtained when the target for the attack of bots was single, then the accuracy of the developed method
dropped to 0.57 on average. The highest precision was observed when there were 25-30 targets of the
attacks, in that case it reached a value of 0.78.</p>
    </sec>
    <sec id="sec-8">
      <title>3. Conclusions</title>
      <p>Method of detecting botnets in the recommendation system based on graph clustering and analysis
of user actions was developed. This method allows to detect botnets and distinguish them by sets of
attack objects. Also, appropriate algorithms have been developed. A subsystem of information
security of the recommendation system has been developed. This subsystem consists of a method of
detecting an information attack on the recommendation system based on trend analysis of item ratings
Number of targets
for each bot, pcs.</p>
      <p>Precision
of bot detection</p>
      <p>Recall
of bot detection
and a method of detecting botnets in the recommendation system based on graph clustering and
analysis of user actions. The developed information security subsystem allows providing higher
robustness of the system to external destabilizing factors represented by information attacks in
contrast to existing methods.</p>
      <p>Our experiments have shown that the precision of bot detection by the developed method averages
0.72 for a random attack, 0.81 for a medium attack, and 0.71 for a popular attack. The worst results
were obtained when the target for the attack of bots was single, then the accuracy of the developed
method, for example, for a popular attack averages 0.57. The highest precision was observed when
there were 25-30 targets of the attack and then it reached a value of 0.78 for a popular attack. In case
of a successful bot attack which have shifted target ratings, the presence of a botnet (and a significant
percentage of bots from it) was always detected. All user profiles identified as bots should be verified
using existing bot identification methods based on statistical analysis of individual profile data. This
allows to refine the results and remove from this set of user profiles those are mistakenly identified as
bots.</p>
    </sec>
    <sec id="sec-9">
      <title>4. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Punam</surname>
          </string-name>
          ,
          <article-title>A Comprehensive Study of Shilling Attacks in Recommender Systems</article-title>
          ,
          <source>IJCSI International Journal of Computer Science Issues</source>
          <volume>14</volume>
          (
          <issue>4</issue>
          ) (
          <year>2017</year>
          ). URL: https://www.ijcsi.org/articles/A-comprehensive
          <article-title>-study-of-shilling-attacks-in-recommendersystems</article-title>
          .php.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <article-title>Shilling attack models in recommender system</article-title>
          ,
          <source>in: Proceedings of the International Conference on Inventive Computation Technologies (ICICT)</source>
          ,
          <year>Coimbatore</year>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . URL: https://ieeexplore.ieee.org/document/7824865/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Gunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kaleli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bilge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Polat</surname>
          </string-name>
          ,
          <article-title>Shilling attacks against recommender systems: a comprehensive survey</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>42</volume>
          (
          <year>2014</year>
          ). URL: https://doi.org/10.1007/s10462-012-9364-9.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rokach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shapira</surname>
          </string-name>
          , P. B.
          <string-name>
            <surname>Kantor</surname>
          </string-name>
          (Eds.),
          <source>Recommender Systems Handbook</source>
          , Springer, Boston,
          <year>2011</year>
          . URL: https://doi.org/10.1007/978-0-
          <fpage>387</fpage>
          -85820-3.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mobasher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bhaumik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Effective attack models for shilling itembased collaborative filtering system</article-title>
          ,
          <source>in: Proceedings of the 2005 WebKDD Workshop</source>
          , held in
          <source>conjuction with ACM SIGKDD</source>
          <year>2005</year>
          , Chicago, Illinois,
          <year>2005</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          . URL: https://citeseerx.ist.psu.edu/viewdoc/download?doi
          <source>=10.1.1.93.7759&amp;rep=rep1&amp;type=pdf#page=23.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Shilling recommender systems for fun and profit</article-title>
          ,
          <source>in: Proceedings of the 13th International World Wide Web Conference</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>402</lpage>
          . URL: https://dl.acm.org/doi/10.1145/988672.988726.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O. V.</given-names>
            <surname>Kurban</surname>
          </string-name>
          ,
          <article-title>Modern information wars in social online networks</article-title>
          ,
          <source>Information Society 23</source>
          (
          <year>2016</year>
          )
          <fpage>85</fpage>
          -
          <lpage>90</lpage>
          . URL: http://nbuv.gov.ua/UJRN/is_2016_
          <volume>23</volume>
          _
          <fpage>15</fpage>
          . (in Ukrainian)
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Gubanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Novikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Chhartishvili</surname>
          </string-name>
          ,
          <article-title>Social networks: models of information influence, management and confrontation</article-title>
          ,
          <source>Publishing House of Physical and Mathematical Literature</source>
          , Moscow,
          <year>2010</year>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V. P.</given-names>
            <surname>Horbulin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. H.</given-names>
            <surname>Dodonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Lande</surname>
          </string-name>
          ,
          <article-title>Information operations and security of society: threats, counteraction</article-title>
          , modeling: monograph, Intertechnology, Kyiv,
          <year>2009</year>
          . (in Ukrainian)
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Ostapenko</surname>
          </string-name>
          ,
          <article-title>Information operations and attacks in socio-technical systems</article-title>
          , Hotline - Telecom, Moscow,
          <year>2007</year>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Bohush</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. K.</given-names>
            <surname>Yudin</surname>
          </string-name>
          ,
          <article-title>Information security of the state</article-title>
          , MK-Press, Kyiv,
          <year>2005</year>
          . (in Ukrainian)
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>O. S.</given-names>
            <surname>Ulichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ye. V.</given-names>
            <surname>Meleshko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sawicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Smailova</surname>
          </string-name>
          ,
          <article-title>Computer modeling of dissemination of informational influences in social networks with different strategies of information distributors</article-title>
          ,
          <source>in: Proc. SPIE 11176</source>
          , Photonics Applications in Astronomy, Communications, Industry, and
          <string-name>
            <surname>High-Energy Physics</surname>
            <given-names>Experiments</given-names>
          </string-name>
          , Wilga, Poland,
          <year>2019</year>
          ,
          <year>111761T</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mobasher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bhaumik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness</article-title>
          ,
          <source>ACM Transactions on Internet Technology</source>
          <volume>7</volume>
          (
          <issue>4</issue>
          ) (
          <year>2007</year>
          ). URL: https://doi.org/10.1145/1278366.1278372.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>