1. Introduction

PLoS ONE 13(5): e0196533 (2018). URL: https://doi.org/10.1371/journal.pone.0196533. [17] W. Zhou

Clustering in the Recommendation System of Social Network

Yelyzaveta Meleshko

elismeleshko@gmail.com 0 1

Mykola Yakymenko

m.yakymenko@gmail.com 0

Serhii Semenov

s_semenov@ukr.net 1 0 Central Ukrainian National Technical University , 8, Universytetskyi prosp., Kropyvnytskyi, 25006 , Ukraine 1 National Technical University "Kharkiv Polytechnic Institute" , 2, Kyrpychova str., Kharkiv, 61002 , Ukraine

2005

2453 67 74

This paper proposes a method for detecting a network of bots in the recommendation system based on graph clustering and analysis of user actions. This method is proposed to be used if there are signs of an attack on the recommendation system and a set of probable targets of the attack is revealed. A series of experiments was carried out to test the effectiveness of the proposed method. The input data for the experiments were generated in the developed software simulation model of users and items of the recommendation system that made it possible to simulate the influence of external destabilizing factors on the system such as information attacks. A subsystem of information security of the recommendation system has been developed. Such subsystem consists of a method of detecting an information attack on the recommendation system based on trend analysis in item ratings and a method of detecting bot networks in the recommendation system based on graph clustering and user action analysis. Also, appropriate algorithms have been developed. recommendation systems, data analysis, clustering, information attacks, bot networks, graphs, algorithms, computer simulation, social networks, websites COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent Systems, April 22-23, 2021, Kharkiv, Ukraine ORCID: 0000-0001-8791-0063 (Ye. Meleshko); 0000-0003-3290-6088 (M. Yakymenko); 0000-0003-4472-9234 (S. Semenov)

1. Introduction

Most often information attacks on recommendation systems are carried out for marketing purposes in order to increase the ratings of the attacking party's products or decrease the ratings of competitors' products [ 1-6 ]. Also attacks on recommendation systems may also be aimed at propagating informational influences during informational conflicts, e.g., for political purposes. Various information influences are often carried out through social networks [ 7-12 ]. Recommendation systems as their components have become one of the targets for information attacks in order to implement such influences [ 1-6, 13-15 ]. By successfully attacking a social network recommendation system or news aggregator one can change the content and order in which items of news feed are displayed to system users. It can be used not only for marketing but also for political or fraudulent purposes. To implement attacks on recommendation systems bot networks are used, since only a certain set of profiles in the system can influence recommendations formation by their joint actions [ 1-6, 14, 15 ]. 1.1.

Mostly in existing studies [ 4, 14-17 ] it is proposed to consider the detection of an information attack on the recommendation system as identical to the detection of bot profiles, and for detecting a bot network, all user profiles are clustered based on their statistics by non-hierarchical methods (for example, k-means clustering). After such clustering bots and authentic users get into different

2021 Copyright for this paper by its authors. clusters.

Since the detection of bot profiles is quite a resource-intensive task, this paper proposes to split the task of protecting the recommendation system from information attacks into two parts: 1) detection of attack signs and 2) detection and neutralization of bot profiles.

Attack detection can be a less resource-intensive task if it will be realized as tracking the dynamics of ratings of system items (all ratings or only critical ones from the point of view of information security). For example, if the ratings of items to protect start to change rapidly, and new ratings that lead to changes in ratings do not match previous average ratings of these items, you should check for bots among users who have started making such ratings. This approach will reduce the number of user profile checks. First, since they will need to be checked only if an attack is suspected. Secondly, because it will be necessary to check the profiles of not all users, but only those who carry out suspicious actions.

If in the system items with an abnormal change in rating trends are detected (that could hypothetically be the result of a botnet attack), the next logical step is to try to identify user profiles that influenced this change and try to find out if they are connected to a coordinated network or networks.

It is natural that changes in trends in the ratings of items are influenced by all users who gave them ratings equal to the target – high if the item's ratings increased or low if the item's ratings decreased. The set of such users is easily identified by simple queries to the recommendation system database. But it is much more difficult to determine which user profiles are really part of the botnet and perform coordinated actions with other members of the botnet and who simply gave a rating that corresponded to their preferences.

Thus, the purpose of this work is to develop а method of detecting bot networks in the recommendation system of social network, which could be used only after detecting signs of an attack to check only those user profiles who interacted with possible attack objects.

The problem of botnet revealing among the profiles of system users can be reduced to the problem of finding a subgraph in the social graph of the recommendation system. The vertices in such a subgraph will be user profiles linked by some joint actions that influenced changes in rating trends of all or most items probable targets of the information attack [ 4, 18 ]. Therefore, to develop a method for detecting a bot network we decided to use methods of graph clustering.

2. Development of a method for detecting bot networks in the

recommendation system

2.1. Detection of signs of information attacks on the recommendation system

Since it is proposed in this paper to split the problem of protecting the social network recommendation system from information attacks into two parts: detecting signs of attack, searching for bot profiles, we will first consider how to solve the first problem before developing a method for solving the second problem.

In [19] a method to determine the set of probable goals of the information attack of the bot network by analyzing the trends of ratings of items of the recommendation system was proposed. Let's consider the basic principles of this method.

The set of possible targets of the attack will be denoted as G.

The presence of an information attack on the recommendation system causes a change in the ratings of a group of items (increases or decreases them), but the change in the ratings of items is not a sufficient reason to think that there is an attack, since ratings may change due to the actions of authentic users. Therefore, a number of additional features which may indicate an information attack have been selected. A set of indicators to determine the presence or absence of an information attack on the item of the recommendation system was proposed as follows:

, = { , , , , , , }, (1) where tr – is a trend of rating dynamics of item i that can take the following values {–1, 0, 1} – respectively "rating downward trend", "no change" and "upward trend"; – is a prediction for the trend of the rating dynamics of item i, for example, based on the Hurst exponent [20]; – is a variance of ratings of item ; – is a variance of time of grading for item ; – is a number of ratings of item at the studied time interval;

– is a number of target ratings for item at the studied time interval; – is a number of gettings of item into user recommendation lists at the studied time interval.

We will assume that the attack on the recommendation system occurs when the ratings of one o r more items of the system are purposefully changed by the joint actions of a group of system user profiles. Herewith the size of damage from the attack does not always depend on the number of items. Successful change of the ratings of even one item can have great consequences if the item is important and it is, for example, in the social, political or medical sector etc.

It is proposed to determine the presence and type of attack according to the following rules: Rule 1. If the item has any 5 signs from the data: the trend of increasing the rating = 1, > 0.73, ≤ , , ≤ , , > , > , , > , , then we believe that there is a high probability of a rating upgrade attack for this item.

Rule 2. If the item has any 5 signs from the data: the trend of decreasing the rating = −1, > 0.73, _ ≤ , , ≤ , , > , > , , < , , then we believe that there is a high probability of a rating downgrade attack for this item.

Once the attack on the recommendation system is detected and the set of probable targets of bots is created, it will be logical to investigate the profiles of all users who influenced the change of item rating trends from this set . And in this work, it is precisely such actions that are proposed to be done , , for the search of bot profiles. 2.2.

Identification of bot profiles in the recommendation system

To solve the problem of identification of bot profiles in this paper a method based on graph clustering and analysis of user actions was developed using the coefficients of "distrust". The method consists of the following stages: target estimates for items from the set . by the next formula:

Stage 1. Create a set of suspicious user profiles , in which we place the profiles that gave the Stage 2. Assign to each user from the set label :suspicious and coefficient of distrust calculated where , , – is a presense of target rating from user for item belonging to the set of probable targets of the attack, takes values 1, if target rating exists and 0 – in the absence of such a rating from user for item ; – is a number of items in the set of probable targets of the attack.

Stage 3. For each pair of users 1 and 2 from the set , where , 1 ≥ and , 2 ≥ , create an edge between them with a label :BotNet.

Stage 4. Perform graph clustering for a subgraph containing vertices with labels :User and :suspicious and edges with label :BotNet. The results of such clustering will be as follows – all bots will get into one large cluster, if there is one botnet, or to several large clusters – if there are several botnets; authentic users will get into different clusters, each of these clusters will contain one user or a small number of users. It is also possible that a certain number of authentic users will get into the cluster with bots, or a group of similar and active authentic users will create a separate cluster if their actions shift the ratings of some items.

Stage 5. Define the largest clusters consisting of (

– ) users, where – is a minimum number of users that may affect the performance of the recommendation system (depends on the parameters of a particular system), – is an approximate value of the error when dividing user profiles into clusters. We consider such cluster (or clusters) as a possible bot network (bot networks). For users who do not get into these clusters, remove the edges with the label :BotNet. Users who got to the subgraph BotNet needed to be further checked analyzing the statistical characteristics of their profiles, for example, using the method proposed in the previous section using neural networks [21]. Also for additional check of profiles from the subgraph BotNet one can search for certain features characteristic of bots, for example, poorly completed personal data or extremely high activity (the characteristics of bots depend on the specific system and can become known in the process of , = ∑ ∈ , , , (2) collecting statistics during its operation). One of the common features of bots for many systems can be: the difference between the values of the variance of ratings and the variance of time intervals between ratings in the profiles of bots from the average values of the corresponding variances in the profiles of system users. For a particular system, the features of bots can be: features of profile registration, features of filling the profile with personal information, style of writing and content of comments, user friends list etc. After checking the statistics of individual user profiles that got into the subgraph BotNet, it have to be corrected, removing from it users identified by statistics as authentic. If there is a cluster in which all users are recognized as authentic – it should cease to be considered a bot network.

Stage 6. Correct the set after analyzing the user ratings from the subgraph BotNet. It is necessary to check which items users from the botnet have coordinately targeted. Remove from items that did not receive at all or received a small percentage of target ratings from users identified as bots. Add to the set items that have received target scores from all bots (or a large percentage of bots).

The research to the most well-known and frequently used methods of graph clustering was conducted [22-27]. Consider the methods of graph clustering that can be used in the proposed method of detecting a network of bots and the basic principles of their operation.

Louvain (or Multilevel) method – graph clustering method based on multilevel optimization of the modularity function [23, 28]. Modularity is a numerical characteristic that describes the strength of division of a graph into modules, is one measure of the structure of networks or graphs [23, 25, 26]. To assess the modularity the next formula can be used:

1 2 2 = ∑ ( − ) ( , ), (3) where – is a number of edges in the graph; – is an adjacency matrix of the graph; – is a number of edges adjacent to the vertex ; – is an number of edges adjacent to the vertex ; ( , ) – is a delta function equal to one, if =

and zero otherwise.

This value is equal to the difference between the number of edges within the cluster during the current splitting and the number of edges if they were randomly generated [26].

The value of modularity shows the severity of clusters, it will be:    equal to one for a complete graph in which all vertices are placed in one cluster; equal to zero for division into clusters, in which each vertex is in a separate cluster; for unsuccessful partitions modularity can take a negative value.

In essence, the value of modularity can be used to assess the quality of splitting graphs into clusters. Qualitative partitioning is characterized by the fact that the number of internal links within each cluster must be greater than the number of its external links.

The value of modularity characterizes not how much for this partition intra-cluster bonds are denser than inter-cluster, but how much they are denser compared to some initial density. Therefore, there is a comparison with the "null hypothesis", which is that the arcs are distributed randomly, that is, there are no patterns in the distribution of the density of arcs within the graph.

The principle of graph clustering algorithms based on modularity optimization is that at each step of the algorithm each cluster vertex is somehow matched to a cluster, the modularity value is calculated and the graph vertices are redistributed between clusters so as to increase the modularity value. The operation of such algorithms stops when it is no longer possible to improve the value of modularity.

The Louvain method consists of two parts. The first part is a greedy optimization of modularity. The second part of the method is as follows: a new graph is created with metapeaks in the form of found clusters and edges with the total weight of all edges going from one cluster to another (loops with total weights of connections inside the cluster are also created). Such a graph is called a metagraph. The algorithm is restarted on a new graph. This is one of the most well-known methods due to its speed.

LabelPropagation method – graph clustering method based on graph labeling. It divides the graph into clusters as follows: each vertex in the graph belongs to the cluster to which most of its neighbors belong, if there are several such clusters, then one of them is chosen by chance [23, 26, 28]. This method is based on the idea that the vertex belongs to the cluster to which the largest number of its neighboring vertices belongs. Consider the principle of operation of this method. At the initial point in time, all vertices are assigned a separate cluster. Each vertex receives a label or color of the corresponding cluster. Then, for each vertex, it is checked which clusters its neighbors belong to and the cluster membership is redistributed. Due to coincidences, it is important to change the order of vertex traversal at each iteration. The algorithm ends when there is nothing to change – all vertices belong to the same clusters as most of their neighbors. To improve the results, one can use the following trick – run the algorithm several times, save the result of its work each time and choose the best option for splitting the graph. The main advantage of this algorithm is almost linear complexity. The disadvantage of the algorithm is that noisy graphs often combine all vertices into one cluster or a small number of clusters.

2.3. Experiments to test the effectiveness of the proposed method on a software model

A series of experiments was carried out to test the developed method of detecting a network of bots.

Since there is no information on bot profiles in the available open datasets for testing recommendation systems, it is necessary to create a software simulation model of users and recommendation system items to generate datasets that will also contain bot profiles.

2.3.1. Software simulation model of users and items of the recommendation system

A software simulation model of users and items of the recommendation system with the ability to simulate information attacks has been developed. Let's consider the basic principles of work of the developed software simulation model of users and items of the recommendation system.

The main type of attacks on recommendation systems are profile injection attacks, which involve the creation of a number of bot profiles (bot networks) that, by coordinated actions, change the ratings and frequency of getting target items into recommendation lists.

A study of existing models of attacks on recommendation systems was carried out in the paper. The bot profile can contain the following types of ratings:  Ratings for items from the set to simulate the actions of real users. The attacker doesn't try to change these ratings, but instead tries to find values for them that are as close as possible to the real target group of users he seeks to influence.  Ratings for items from the set , these are the maximum (or close to them) ones in the system for target items for which the attacker seeks to increase the rating.  Ratings for items from the set , are the minimum (or close to them) ones in the system for target items for which the attacker seeks to decrease the rating.

The number of target items in the bot can vary from 1 to and be contained only in the set or only in the set , or both, and the number of items to fill the profile – from 0 to .

Random Attack

In bot profiles the set will be filled with ratings for randomly selected items. Ratings for selected items will also be selected randomly, but in such way that they are close to the global average score in the system, for example, a normal distribution with a mathematical expectation equal to the global average will be used. The target item will be given the maximal or minimal rating , depending on the targets of the attack. The knowledge and effort required to carry out such an attack is quite minimal – the global average score in many systems can be easily learned directly or through indirect data. This attack is not very effective.

Average Attack

Uses the individual averages of each item's ratings to create the set . More information needs to

be gathered for this attack. However, an average attack can be successful even with a small set of elements in , that reduces the amount of information needed to collect. But the cost of such a reduction in the required data will be a large number of profiles with almost the same set of rated elements that will, of course, be easy to detect. This attack is more effective than random. But it is practically ineffective for collaborative filtering algorithms based on the neighborhood model of item-based type.

The average attack requires a relatively large amount of knowledge about the statistics of the actions of real users in the system. Reasonable defense of the recommendation system from such attacks will make it difficult for the attacker to collect the necessary data. To bypass such defense, other attacks are used, for which the requirements for the amount of knowledge are much lower.

Consider existing attacks that require less knowledge than the average attack, but work more effectively than a random attack.

The purpose of this attack is to associate the attacked item with a small number of items that are often rated by users (let's call them widely known). An attacker creates profiles of bots that contain a set of ratings of widely known items. Such profiles are more likely to look like a large number of users, as well-known items are those that are rated by many users. Data for such an attack is quite easy to obtain. So, among the well-known items, several are chosen at random. These items are given maximum ratings along with the target item. Some of the items in the set can be randomly rated, for example like in a random attack, to vary bot profiles. This is a fairly effective attack but like the average attack becomes ineffective when used against collaborative filtering based on the item-based neighborhood model.

The basic idea of this attack is to change the rating of the item in the target group of users with known or easily predictable preferences. The target item will be rated only in a certain segment of users, so that it gets recommendations only to them. Otherwise, if the target item gets in recommendations for users from other segments, it may start receiving low ratings from them and the number of such low ratings will be greater than artificially inflated ones. To carry out such an attack, an attacker need to find real users who belong to the target segment and collect data on the ratings that they usually give to system items. As with the bandwagon attack, it is usually determined which items in the target segment are widely known. These items are assigned the maximal rating together with the target item.

To ensure maximum effect from the attack, some items for the set are randomly selected and

receive minimal ratings, which allows the attacker to make the bot profiles various. This attack is effective against collaborative filtering algorithms based on the item-based neighborhood model.

It should be noted that all of the attack models discussed above can be used to decrease item ratings, but there are specialized attacks that work better than others exactly to decrease ratings.

Consider attack models designed to decrease ratings of recommendation system items. Love/Hate Attack

This attack is very simple – no knowledge to be required. The target item is given a minimal rating , and the items for the set are chosen randomly and receive maximal ratings . Despite its extreme simplicity, this is one of the most effective downgrade attacks against collaborative filtering algorithms based on the user-based neighborhood model.

Reverse Bandwagon Attack

This is variation of the above-described bandwagon attack that selects widely known items for that are rated low by the vast majority of users. These items are rated low in bot profiles, and low ratings are assigned to the target item. Thus the target item begins to be associated with items that are not liked by a large number of users, and this increases the probability that low ratings will be predicted for the item and it will not be included in the lists of recommendations. Although this attack is not as effective as the average attack with a lot of knowledge for user-based systems, it is a very effective attack to downgrade against item-based systems.

Low-level attacks use widely known items to fill a bot's profile with ratings for them. In this way, an attacker can create a profile similar to the average user by examining the ratings of only widely known items. information to attack.

If an attacker knows exactly which algorithm the referral system uses, he can gather more Thus, attacks can be classified into:  

Attacks with little knowledge – this type of attack does not require detailed knowledge of the distribution of scores in the system. It requires system-independent knowledge that can be easily obtained through public sources of information.

Attacks with a lot of knowledge – an attacker needs to have as much knowledge as possible about the system's algorithms and the distribution of ratings of system items. For example, some attacks require the attacker to know the mean and standard deviation for each item in the system. An example of an attack with a lot of knowledge is a popular attack.

Popular Attack

Assume that the system uses a standard user-based collaborative filtering algorithm, where similarity between users is determined by Pearson correlation. As in the bandwagon attack, the set is filled using well-known system items. However, this does not guarantee a high similarity between the bot profile and the real profiles. Therefore, the popular attack uses the average values of the ratings of selected well-known items to fill the set . And in order to vary the profile data assign to some randomly selected items in ratings equal to ( + 1) or , depending on whether the average rating for the item is higher or lower. This strategy will lead to positive correlations between bot profiles and authentic profiles. It doesn't take a lot of knowledge to identify well-known items, but a lot of information needs to be gathered to determine the average ratings of selected items. A popular attack can also be easily tuned for downgrade attacks. This attack can be detected by comparing profiles in the system – the bot profiles of one bot network will be very similar.

Probe Attack

The more similar bot ratings are to those of real users, the more difficult it is to recognize bot profiles. Knowledge of the real preferences of different segments of users can be obtained from the system itself through a probe attack. To carry out this attack the attacker creates a seed profile and then uses it to obtain recommendations from the system. These recommendations are created on the basis of system information about real users, so the use of the received recommendations will allow the attacker to create bot profiles more similar to real users. The attacker can probe a small number of users to then affect a small group, as in a segmental attack, or a large part – to obtain information, for example, for an average attack. An attacker needs to use only a small number of seed profiles in order for the recommendation system to provide him with the necessary information in the form of recommendations.

Our studies of models of information attacks on recommendation systems have shown that the easiest to implement and the least resource-intensive attacks are random and average ones, and the most effective and inconspicuous (although quite resource-intensive) attack is a popular attack. Other studied models of attacks in fact are their modifications, for example, designed to attack only a certain segment (segment attack), or to attack only to lower ratings (love/hate attack), or to reduce resource consumption by reducing the inconspicuous of the botnet (bandwagon attack, love/hate attack). Therefore, in the software simulation model of users and items of the recommendation network, we decided to model botnets based on three main models of attacks – random, average and popular attack.

Models of profile injection attacks on

recommendation systems Random Attack Probe Attack Average Attack

Segment Attack

Popular Attack Bandwagon Attack

Love/Hate Attack

The proposed method of software simulation of users and items of the recommendation system consists of the following parts:  Generation of the structure of the social graph of the recommendation system [29].  Simulation of the behavior of users and bots of the recommendation system.

The generation of the structure of the social graph of the recommendation system was carried out on the basis of theory complex networks [22, 30-32] and the Barabashi-Albert (BA) model [22, 30], which creates scale-free networks based on 2 conditions:

Growth. Starting with a small number of nodes 0, on each new time iteration one new node is

added with links (where ≤ 0) connecting a new node to different existing nodes.

Preferential attachment. Probability that the new node connect to some existing node is the higher the more connections the -th node has: = , (4)

∑ where – is a degree of -th node, and the denominator calculates the sum of all degrees of existing nodes in the network.

To generate the recommendation system the following subgraphs are used: 1) Users-Friends, Users-Followers, Posts-Published, Posts-Viewed and Posts-Liked – are created by the generator of the social network graph, 2) Users-Similarity, Posts-Similarity and Posts-Recommended – are created by the recommendation system.

Stages of the developed method: Stage 1. A non-oriented subgraph Users-Friends is generated based on BA.

Stage 2. A non-oriented subgraph Users-Followers is generated based on BA.

Stage 3. Subgraphs Users-Friends and Users-Followers are combined into one graph.

Stage 4.  An oriented subgraph Posts-Published is generated based on modified BA. In the first iteration 0 users are randomly selected to "create" 0 posts. Probability post publication by some user:

1 + 2 + 3 = , (5)

∑ ( 1 + 2 + 3 ) where 1 – is a number of friends of -th node, 2 – is a number of subscribers of the -th node, 3 – is a number of posts of the -th node, and the denominator calculates the sum of all these values for all existing nodes in the network.

Stage 5. Subgraph Posts-Published is joined to the general graph.

Stage 6. An oriented subgraph Posts-Viewed is generated based on BA. Probability the post to be reviewed:

1 + 2 + 3 = , (6)

∑ ( 1 + 2 + 3 ) where 1 – is a number of friends of the author of the -th post, 2 – is a number of subscribers to the author of the -th post, 3 – is a number of views of the -th post, and the denominator calculates the sum of all these values for all existing nodes in the network.

Stage 7. Subgraph Posts-Viewed is joined to the general graph.

Stage 8. Subgraph Posts-Liked is generated. Probability of getting a like:

1 + 2 + 3 + 4 = , (7)

∑ ( 1 + 2 + 3 + 4 ) where 1 – is a number of friends of the author of the -th post, 2 – is a number of subscribers to the author of the -th post, 3 – is a number of views of the -th post, 4 – is a number of likes of the -th post, and the denominator calculates the sum of all these values for all existing nodes in the network.

Stage 9. Subgraph Posts-Liked is joined to the general graph.

Stage 10. Subgraphs Users-Similarity, Posts-Similarity and Posts-Recommended are generated by the algorithms of the selected recommendation system.

Studies of the parameters of the generated social graphs [29] (in particular such as density, diameter of a network, clustering coefficient et al.) showed their correspondence to the properties of real social networks.

Modeling of user behavior was based on the generation of a dynamic graph Users-Ratings-Items. Stages of the developed method: Stage 1.  Initialization of user parameters and system items.

Stage 2.  Creating a "Seed" of social graph.

Stage 3.  Simulation of time iterations. At each time iteration, a number of users, items, and bots join the model. Users and bots rate items.

Stage 4.  Stopping the model and saving the resulting data set to a file.

Generating ratings for regular users: , =

∑ =0 | , − , |,

, = Ψ(5 ,

+ + ), where , – is a distance between user and item in the multidimensional space of hidden factors; – is a number of hidden factors; , – is a i-th hidden factor of user ; , – is a -th hidden factor of item ;

– is a user offset in items' evaluation (level of requirements to contenct); – is a shift of the item in obtaining ratings (content quality level); Ψ() – is an function that converts the resulting fractional number into a discrete number from a set of ratings [0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0].

Among the existing attack models for software simulation we chose random, average and popular attacks – since these models are the most commonly used, universal and illustrate different attack strategies.

Generating a ratings for bots: , = { attackPattern(), if item – "ordinary" 5.0, if item – "target" , where attackPattern() – is a function generating ratings for non-target items according to the type of attack model selected.

2.3.2. The results of experiments to verify the effectiveness of the proposed method of detecting bot networks

 

Data sets for experiments were generated in the developed software simulation model of the recommendation system. The format and statistical features of the data were generated as close as possible to the corresponding characteristics of MovieLens Datasets [33]. Information attacks were modeled using random, average, and popular attack models [ 2-5 ]. The Label Propagation algorithm was used as a graph clustering algorithm.

In the series of experiments, 5%, 10%, 20% and 30% of bots were generated in different series, all other users of the system are authentic. In different experiments bots had different numbers of targets to attack: 1, 5, 10, 15, 20, 25, 30, 35, 40 and 45 targets. Also 20% of profiles with a high level of activity in the recommendation system were generated among authentic users. Such users gave significantly more ratings than the average user. System items were generated to belong to one of the 19 clusters.

The bot search algorithm for the studied recommendation system had the following parameters:

= 0.05 – that is, the user has set target ratings of at least 5% of the items that are identified by the information security subsystem as probable targets of the attack. The threshold is small because it is not known how accurately the targets of the attack are recognized, there may be many items with the ratings that have changed naturally.

The elimination of user profiles from the BotNet subgraph is based on the values of the variance of ratings and the variance of time intervals between ratings. Authentic users who mistakenly get into the BotNet subgraph are users whose profile data satisfy the following rule: | – , | < 0.15 AND ( , > 72 AND , < 600), where – is a variance of ratings in the user profile, , – is a truncated mean variance of ratings in the system user profile (30% of the extreme values were truncated, i.e., value equal to the maximum possible percentage of bots in the system in question), , – is a variance of time intervals between rates. The variance of ratings in the profiles of authentic users is not significantly different from the average variance (the attacker of system can not know this value, he can only estimate it, so can not accurately reproduce this characteristic when creating bot profiles). And checking the variance of time intervals between ratings is as follows – users who have too identical or too different time intervals between ratings should be considered suspicious, while the variance of time intervals between ratings of authentic users is usually in a certain range. (8) (9) (10)

In the table 1 the results of a series of experiments to verify the accuracy of the developed method of identification of bot profiles are show. The following notations of attack models are used: RA – random attack, AA – average attack, PA – popular attack.

, recall = (11) (12) (13)

RMSE of classification of users on bots and

authentic To assess the quality of the developed method the following metrics were selected.

Precision – a measure that characterizes how many positive triggers of the information security subsystem of bot detection are correct. It was calculated by the formula: where – is a correctly detected bot;

– is an authentic user incorrectly identified as a bot.

Recall (or Sensitivity) – a measure that characterizes the ability of the information security subsystem to correctly detect the profiles of bots without taking into account false positives. It was determined by the formula:

As shown by a series of experiments from Table 1, the precision of bot detection by the developed method in average is 0.72 for a random attack, 0.81 for an average attack, and 0.71 for a popular

The accuracy of bot recognition by the developed method for the popular attack model was also investigated in more detail (Table 2).

The average values of the accuracy of the developed method for different numbers of information attack targets in bots where

– is a correctly detected bot; RMSE – root mean square error:

– is a bot incorrectly identified as an authentic user.

The experiments have shown that the detection precision of bots by the method developed for a popular attack in avereage is 0.71, completeness is 0.82, and RMSE is 0.22. The worst results were obtained when the target for the attack of bots was single, then the accuracy of the developed method dropped to 0.57 on average. The highest precision was observed when there were 25-30 targets of the attacks, in that case it reached a value of 0.78.

3. Conclusions

Method of detecting botnets in the recommendation system based on graph clustering and analysis of user actions was developed. This method allows to detect botnets and distinguish them by sets of attack objects. Also, appropriate algorithms have been developed. A subsystem of information security of the recommendation system has been developed. This subsystem consists of a method of detecting an information attack on the recommendation system based on trend analysis of item ratings Number of targets for each bot, pcs.

Precision of bot detection

Recall of bot detection and a method of detecting botnets in the recommendation system based on graph clustering and analysis of user actions. The developed information security subsystem allows providing higher robustness of the system to external destabilizing factors represented by information attacks in contrast to existing methods.

Our experiments have shown that the precision of bot detection by the developed method averages 0.72 for a random attack, 0.81 for a medium attack, and 0.71 for a popular attack. The worst results were obtained when the target for the attack of bots was single, then the accuracy of the developed method, for example, for a popular attack averages 0.57. The highest precision was observed when there were 25-30 targets of the attack and then it reached a value of 0.78 for a popular attack. In case of a successful bot attack which have shifted target ratings, the presence of a botnet (and a significant percentage of bots from it) was always detected. All user profiles identified as bots should be verified using existing bot identification methods based on statistical analysis of individual profile data. This allows to refine the results and remove from this set of user profiles those are mistakenly identified as bots.

4. References

[1]

Kumari ,

Punam , A Comprehensive Study of Shilling Attacks in Recommender Systems , IJCSI International Journal of Computer Science Issues 14 ( 4 ) ( 2017 ). URL: https://www.ijcsi.org/articles/A-comprehensive -study-of-shilling-attacks-in-recommendersystems .php.

[2]

Kaur ,

Goel , Shilling attack models in recommender system , in: Proceedings of the International Conference on Inventive Computation Technologies (ICICT) , Coimbatore , 2016 , pp. 1 - 5 . URL: https://ieeexplore.ieee.org/document/7824865/.

[3]

Gunes ,

Kaleli ,

Bilge ,

Polat , Shilling attacks against recommender systems: a comprehensive survey , Artificial Intelligence Review 42 ( 2014 ). URL: https://doi.org/10.1007/s10462-012-9364-9.

[4]

Ricci ,

Rokach ,

Shapira , P. B. Kantor (Eds.), Recommender Systems Handbook , Springer, Boston, 2011 . URL: https://doi.org/10.1007/978-0- 387 -85820-3.

[5]

Mobasher ,

Burke ,

Bhaumik ,

Williams , Effective attack models for shilling itembased collaborative filtering system , in: Proceedings of the 2005 WebKDD Workshop , held in conjuction with ACM SIGKDD 2005 , Chicago, Illinois, 2005 , pp. 13 - 23 . URL: https://citeseerx.ist.psu.edu/viewdoc/download?doi =10.1.1.93.7759&rep=rep1&type=pdf#page=23.

[6]

S. K.

Lam ,

Riedl , Shilling recommender systems for fun and profit , in: Proceedings of the 13th International World Wide Web Conference , 2004 , pp. 393 - 402 . URL: https://dl.acm.org/doi/10.1145/988672.988726.

[7]

O. V.

Kurban , Modern information wars in social online networks , Information Society 23 ( 2016 ) 85 - 90 . URL: http://nbuv.gov.ua/UJRN/is_2016_ 23 _ 15 . (in Ukrainian)

[8]

D. A.

Gubanov ,

D. A.

Novikov ,

A. G.

Chhartishvili , Social networks: models of information influence, management and confrontation , Publishing House of Physical and Mathematical Literature , Moscow, 2010 . (in Russian)

[9]

V. P.

Horbulin ,

O. H.

Dodonov ,

D. V.

Lande , Information operations and security of society: threats, counteraction , modeling: monograph, Intertechnology, Kyiv, 2009 . (in Ukrainian)

[10]

H. A.

Ostapenko , Information operations and attacks in socio-technical systems , Hotline - Telecom, Moscow, 2007 . (in Russian)

[11]

V. M.

Bohush ,

O. K.

Yudin , Information security of the state , MK-Press, Kyiv, 2005 . (in Ukrainian)

[12]

O. S.

Ulichev ,

Ye. V.

Meleshko ,

Sawicki ,

Smailova , Computer modeling of dissemination of informational influences in social networks with different strategies of information distributors , in: Proc. SPIE 11176 , Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics

Experiments

, Wilga, Poland, 2019 , 111761T .

[13]

Mobasher ,

Burke ,

Bhaumik ,

Williams , Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness , ACM Transactions on Internet Technology 7 ( 4 ) ( 2007 ). URL: https://doi.org/10.1145/1278366.1278372.