Co-rating Attacks on Recommendation Algorithms

               Manfred Moosleitner                            Günther Specht                         Eva Zangerle
              Department of Computer                       Department of Computer               Department of Computer
                     Science                                      Science                              Science
               Universität Innsbruck,                       Universität Innsbruck,               Universität Innsbruck,
                      Austria                                      Austria                              Austria

                                                       firstname.lastname@uibk.ac.at

ABSTRACT                                                                   rative filtering recommender algorithms, rating predictions
Online shops, streaming services, and booking systems use                  are frequently computed based on user similarity, which is
algorithms to recommend items from their stock to users.                   captured by co-rated items of a pair of users. This opens
These recommendations are often calculated based on the                    a possibility for manipulations of the computed recommen-
interactions of other users with the items, e.g., buying a                 dations, as the ratings stem from the user base and their
product or watching a movie. This creates an attack point                  interactions with the system. One type of possible manipu-
where the outcome of recommendation algorithms can be                      lations is shilling attacks [11], where additional interactions
purposefully manipulated by manually or automatically cre-                 are injected into the system to alter the recommendations
ated user interactions, aimed to raise or lower the relevance              computed by the system. One famous example of such a
of specific items. We study the attackability of recommender               shilling attack at Amazon was reported on December 7th in
algorithms by simulating a series of attacks on six recom-                 2002 by the British online news service “The Register”1 . On
mendation algorithms, using three attack strategies, and two               Amazon, users are provided with recommendations in the
opposing attack objectives. We run these experiments with                  form of “Customers who viewed this article, also viewed ...”.
varying numbers of co-ratings per attack and evaluate the                  To manipulate these recommendations, a group of attackers
overall item ranking and an average change in average rank.                interacted with two different books multiple times, aiming
Our results show that the effort required of and the efficiency            to make the second book appear in the recommendation sec-
reached by the attacks greatly depends on the strategy, ob-                tion of the first book, although it diverged in content and
jective, and recommendation algorithm. Additionally, the                   genre. It is unknown how many people were involved exactly
calculated average change in average rank provides an indi-                and how often the two books were co-viewed, but this exam-
cator about the attackability of recommendation algorithms.                ple shows vulnerabilities that can be abused to manipulate
We find that neighborhood- and cluster-based algorithms                    the outcome of recommender systems.
show a higher vulnerability against attacks compared to al-                   For this paper, we focus on co-rating attacks, a specific
gorithms based on matrix factorization.                                    form of shilling attacks. A co-rating attack means that a
                                                                           single attack in an attack series consists of two ratings, e.g.,
                                                                           one rating of user B for item X and one rating of user B
Keywords                                                                   for item Y, to influence the rating behavior of the targeted
Recommender Systems, Co-Rating Attacks, Attack Mea-                        system, possibly for one specific user A. We call user A the
surements, Manipulation, Bias                                              target user and user B the auxiliary user. Similarly, we call
                                                                           item X the target item and item Y the auxiliary item, and the
1.    INTRODUCTION                                                         attack aims to raise the predicted ratings for the target item
                                                                           X. The contribution of this work is two-fold: (i) we provide
   Recommender systems are ubiquitous in today’s online                    a systematic evaluation of the vulnerability of recommen-
world as they provide users of, for instance, online shops,                dation algorithms against co-rating attacks and the effort
video, and music streaming services with recommendations                   required for such attacks to be successful, and (ii) we pro-
of items that might be interesting to them [21, 23]. Such rec-             pose a new metric to measure the attackability and hence,
ommendations are computed based on interactions of users                   the vulnerability (respective resilience) of recommendation
with items. An interaction can be, e.g., a user who rates an               algorithms against different co-rating attack strategies.
item with five stars, often called collaborative filtering [22].
Based on interactions between users and items, in collabo-
                                                                           2.   RELATED WORK
                                                                             In the following, we discuss work related to attacks on
                                                                           recommender systems and measures that aim to quantify
                                                                           these attacks.
                                                                             Lam et al. [12] classified three different areas where at-
                                                                           tacks on recommender systems can happen. Firstly expo-
32nd GI-Workshop on Foundations of Databases (Grundlagen von Daten-        sure, where systems get breached, and private data from
banken), September 01-03, 2021, Munich, Germany.                           1
Copyright © 2021 for this paper by its authors. Use permitted under Cre-     https://www.theregister.co.uk/2002/12/07/
ative Commons License Attribution 4.0 International (CC BY 4.0).           sodomites_overrun_amazon_com/
users is leaked. Secondly, the authors describe sabotage as       to be successful if the targeted item reaches the top-10 rec-
an attack to hinder the service of the targeted system at         ommendations, as this increases the visibility of the item and
all and give denial of service as an example. Thirdly, they       previous research has shown that a list of ten recommenda-
mention the area of bias, where the attacks focus on pur-         tions can be a sensible choice in regards to set attractiveness
posefully changing the ratings of the recommender system.         and choice difficulty [2]. For pull attacks, we consider an
The former two types, exposure, and sabotage are out of the       attack to be successful if the rank of the target item, after
scope of this work, as they are more in the area of classic se-   the attack, is at least ten ranks lower than before the attack.
curity topics. We focus on the manipulation of bias instead.      This would pull the targeted item out of the top-10 and de-
Bias in our context can be, e.g., that a popular item is given    creases its visibility. We define the impact of an attack as
a higher relevance when calculating the recommendations,          the change in rank of the target item in the average item
purely due to its popularity. This is in contrast with the        ranking.
main purpose of recommender systems, namely, to suggest              For our experiments, we rely on the MovieLens 100K
items to users that fit their personal taste.                     dataset [8]2 as the basis, which is an established dataset
   Jannach et al. [9] categorize attacks into three attack di-    widely used in the recommender systems research commu-
mensions. They termed the first category push attacks; their      nity [18, 6, 13, 19, 16, 20, 14, 1].
purpose is to increase the predicted ratings for a specific          Our experiments are organized in individual steps. A step
item. The attacks in the second category are nuke attacks,        can be viewed as the current state of the experiment, includ-
which aim to lower the predicted ratings for a specific item.     ing the dataset, the training of the algorithm, and the pre-
Please note that we refer to these attacks as pull attacks.       dicted ratings. For each recommender algorithm, we start
The last category described serves the purpose of rendering       with zero co-ratings, train the algorithm on the unmodified
the recommender system ineffective and unpredictable. One         dataset, and calculate the rankings per item for each user
additional dimension for attacks lies in the strategy when se-    and the average rank per item over all users. We refer to
lecting the auxiliary user and item. Mobasher et al. [15] give    this as step 0, and subsequently, add a single co-rating to
a detailed overview of different strategies, which focuses on     the dataset (and hence, run an attack); we refer to this as
the selection of the auxiliary items by using statistics about    step 1 and repeat this procedure. In each step, we compute
the data. Some of the strategies described choose auxiliary       the change in the rank and the average rank for the tar-
items from the same category and genre as the target item,        get item. We hypothesize that a noticeable effect should be
other strategies use the popularity of the item or choose         reached within 100 iterations, therefore using this number
them randomly. In contrast, we focus on the selection of the      as the maximum number of steps in our experiments.
auxiliary user.                                                      We analyze the experiments along the four dimensions
   Early works [11, 18, 16, 19] and more recent publications      recommendation algorithm, attack strategy, attack purpose,
[4, 1] feature attacks only on variations of the K-Nearest-       and the number of steps, where the attack purpose is defined
Neighborhood (KNN) algorithms. In this work, we addi-             as whether the goal of the attack is to lower or raise the rank
tionally consider algorithms based on matrix factorization        of the target item, whereas the attack strategy describes how
(MF) [14], singular value decomposition (SVD) [10], co-           the auxiliary user and the auxiliary item are selected.
clustering [6], and “popularity differential” [13].                  The configurations for single experiments have been com-
   Burke et al. [3] state that a popular way to measure the       posed such that they cover a broad spectrum of the full
robustness of a recommendation algorithm against attacks          experiment space. We describe these dimensions in the fol-
is the average prediction shift, which measures the average       lowing.
shift of the ratings predicted by the algorithm across all
users. The prediction shift reflects whether an attack has        3.1    Recommendation Algorithms
the intended effect, but not how large the impact is. As             As the main focus of this work lies in examining how
a solution, they propose using the average hit ratio, which       different recommendation algorithms behave when exposed
counts the number of target items that appear in the list of      to co-rating attacks, we evaluate six recommendation algo-
recommendations of the target user.                               rithms3 . We selected two k-Nearest Neighbor [17] variations,
   O’Mahony et al. [20] analyze the performance of attacks        KNNBasic and KNNWithMeans 4 , where the neighborhood
using HitRatio and prediction shift, but only on a KNN-           is built using the k-most similar users with respect to the
based recommender. Lam et al. [12] provide an overview            user ratings. In our case, mean squared difference [17] was
of different possibilities to attack a recommender system,        used to calculate the similarity. The kNN-based algorithms
but they do not present a quantitative analysis. We use a         were chosen because they are fairly simple, are commonly
similar approach as O’Mahony et al. [19], but evaluate the        used as a baseline, and it should be easy to find a working
attacks for multiple recommendation algorithms and multi-         attack strategy for them.
ple attack strategies, and measure the effect of the intensity       SlopeOne [13] is also a simple algorithm, where the ratings
of an attack.                                                     for a user A are computed using the ratings of other users
                                                                  who share rated items with A. E.g., user A rated item X
3.   METHODOLOGY                                                  and Y, and user B only rated item X. The rating difference
  The general idea of our work is to capture the effort re-       of user B is added to the rating of user A’s rating for item
quired for an attack to be successful and the impact of co-       X to predict the rating of user A for item Y.
rating attacks on the predicted ratings. The required effort      2
                                                                    https://grouplens.org/datasets/movielens/100k/
can be measured by the number of co-ratings used in an at-        3
                                                                    For the implementation, we relied on the Python Surprise
tack, i.e., the number of interactions of the auxiliary user      library https://github.com/NicolasHug/Surprise
with the target item and the auxiliary item in the form of        4
                                                                    Here the predicted rating is added to the mean rating across
two rating entries in the dataset. We consider a push attack      all users.
   CoClustering [6] was selected because it does not only use        2 shows some statistics about the data before co-ratings are
user- and item-clusters, but also the co-clusters which should       added and the chosen user and movies. The data collected
make it harder to find a working attack strategy compared            shows that the target-user is quite active, with more than
to KNN-based approaches.                                             double the number of ratings than the average user and more
   Singular Value Decomposition (SVD) [10] aims to recon-            than four times the number of ratings than the median over
struct the rating matrix (user × items) from a matrix repre-         all users. We can also see that the mean ratings and the
senting the user’s latent factors and a second matrix, repre-        standard deviation for the target user are higher than the
senting the latent factors of the items. Non-Negative Matrix         mean rating for all users. The data for the auxiliary movie
Factorization (NMF) [14] also relies on matrix factorization,        shows that the movie is rated more often, but only a little
which is used by Netflix [7] and Youtube [5].                        lower, than the average movie. The target movie has only a
   To assess the difference in the performance of the cho-           single rating in the original data, which should be beneficial
sen algorithms, the algorithms were evaluated using five-fold        for our attacks.
cross-validation using the MovieLens dataset and the con-               An attack is now realized by the auxiliary user awarding
figuration parameters as they were used in our experiments.          high ratings for both movies (and hence, co-rating these)
The resulting Mean Absolute Error (MAE) and Root Mean                until the target item appears in a prominent position in the
Squared Error (RMSE) of the predicted ratings in Table 1             target user’s list of recommended movies.
show that the performances of the algorithms are similar.               For running the experiments, in the first step, the raw
                                                                     dataset is used to train the recommender algorithms to be
                              RMSE                 MAE               evaluated. These trained models are then used to predict
                          mean     std        mean     std           the ratings for all items for all users, from which we infer a
          KNNBasic        0.9785 0.0053       0.7105 0.0036          ranking of all items for each user. These per-user rankings
      KNNWithMeans        0.9507 0.0032       0.7491 0.0023          are then used to compute the average rank for each item
               SVD        0.9369 0.0044       0.7386 0.0040
                                                                     over all users. In the next step, one co-rating is added to
               NMF        0.9628 0.0038       0.7569 0.0030
           SlopeOne       0.9447 0.0024       0.7423 0.0025
                                                                     the dataset. Subsequently, the extended dataset is used for
        CoClustering      0.9655 0.0050       0.7561 0.0059          computing updated rankings for all items for all users and
                                                                     the average rank over all users. This procedure is repeated
  Table 1: Five-fold cross-validation of the evaluated algorithms.   100 times for each algorithm, providing the rank of the tar-
                                                                     get item for the target user and the average rank over all
                                                                     users, and also the change in rank between the individual
3.2    Attack Strategies                                             steps, to show the effect of the attack from one step to the
                                                                     next.
   In the following, we investigate three different co-rating
attack strategies. Our first attack strategy is the baseline
approach (BASIC), where the auxiliary user is created as a
                                                                     3.3    Attack Purposes
fresh new user, i.e., a user that has not provided any rat-             We evaluate the three attack strategies for both pull at-
ings so far. The second attack strategy is the user activity         tacks (aiming to lower the predicted rating of the target
approach (ACT), where the auxiliary user is selected based           item) and push attacks (aiming to raise the predicted rating
on the rating activity of the user—the dataset is analyzed           of the target item). For push attacks, it is sufficient to pick
at runtime, and the most active user is determined and mis-          an item that has not been rated by the target user, as the
used as the auxiliary user, based on the data in step 0. The         target item. For pull attacks, in contrast, we computed the
third attack strategy is the user similarity approach (SIM),         recommendation lists, based on the data in step 0, and chose
where the user most similar to the target user is selected as        the top-ranked item as target item.
the auxiliary user. The core idea here is that user similarity
metrics are also used in collaborative filtering recommender         3.4    Quantifying Attackability
algorithms [22] and hence, similarly choosing the auxiliary             Here we introduce our novel attackability metric, which
user seems promising. As for the user similarity compu-              takes into account the effort and the effect of an attack. We
tation, we rely on the cosine similarity of the user rating          apply this metric to the data collected during our experi-
vectors. For all three attack strategies, the same auxiliary         ments and present the results in Section 4.3.
user is used in all steps. It is important to note that we              Besides investigating the manipulation of the rank of an
examine the effect of co-rating attacks on recommendation            item for a single user, we are also interested in the impact
algorithms alone and not whole recommender systems, thus             of the co-rating attacks on the average ratings of each item
using the full knowledge about the dataset in the attack             across all users. Thus, we compute an average ranking for all
strategies.                                                          items (cf. Section 3.2), which is used to determine the aver-
   In all approaches, an arbitrary user was chosen as target         age rank of the target item over all users. Furthermore, we
user—i.e., we aim to measure the impact of the co-rating             aim to quantify the attackability of recommendation algo-
attacks on the movie rankings for this user. For our experi-         rithms. Hence, we propose to consider the absolute average
ments, the user with user-id 1 was picked as the target user.        change in the average rank of the target item. Consider the
The movie “The Truth About Cats & Dogs (1996)” (movie                target item being ranked at position 1000 at step 0, and in
id 111) was randomly selected as the auxiliary item from             step 1 ranked at 100, then the change in ranks is 900. We
the set of movies the target user rated with five stars. The         compute this change across all steps and compute the aver-
movie “Scream of Stone (Schrei aus Stein) (1991)” (movie             age change in rank for the target item. Formally, we define
id 1682) was selected as the target item, because it was the         Ri as the average item rank for step i, i.e., R0 for step 0,
newest movie in the dataset (highest movie id) and because           etc. We further define ranki as the rank of the chosen tar-
the movie has not been rated by the target user yet. Table           get item in the corresponding average item ranking Ri . The
                                                                                Ratings
                                               Min       Q1     Median        Q3 Max            Mean       Std      Count
                           Full Dataset         1        3        4           4     5           3.53       1.13    100,000
                            Target User         1        3        4           5     5           3.61       1.26      272
                            Aux. Movie          1        3        4           4     5           3.49       0.96      272
                           Target Movie         3        3        3           3     3           3.00       0.00       1
                                                      (a) Statistics about the ratings in the data.

                                                                          Number of Ratings
                                            Min      Q1       Median       Q3 Max Mean                     Std      Count
                             Per User       20       33        65          148 737 106.05                 100.93     943
                            Per Movie        1        6        27           80   583    59.45             80.38     1,682
                                             (b) Statistics about the number of ratings in the full dataset.


Table 2: Statistics about the data and the chosen user and movies for different aggregations. Shown are the five-number summary, mean, and
standard deviation of the rating values in Table 2a and of the number of ratings in Table 2b. The column Count shows the number of rows for
                            each aggregation, and the ratings for the auxiliary movie are from 272 individual users.


change in rank is then defined as the difference between two                    KNNWithMeans quickly arrive at the top ranks. NMF and
steps: ∆ranki,i+1 = ranki+1 − ranki . The average change                        SVD show some erratic behavior and SlopeOne converges
in rank is computed based on the sum of all ∆ranki,i+1 of                       shortly after reaching rank 200. CoClustering was attacked
the available steps, as shown in Equation 1, where n is the                     successfully again since a rank in the top ten was clearly
number of steps to be used.                                                     reached. The attacks on KNNBasic and KNNWithMeans
                                                                                also reach a high rank but did not reach the top-10. NMF,
                             n−1
                             X   ∆rank                                          SVD, and SlopeOne could not be attacked successfully. Gen-
                                            i,i+1
                 ∆rank =                                            (1)         erally, we observe that the activity-based attack strategy
                                        n
                             i=0
                                                                                reaches a higher attack effect than the basic attack approach.
                                                                                  The third set of experiments used the similarity between
4.    RESULTS AND DISCUSSION                                                    the users to select the target user; the results are shown in
                                                                                Figure 1c. We observe a tendency towards higher ranks for
   In our experiments, we run multiple attacks, using mul-                      NMF and SVD, but the erratic behavior starts to dominate
tiple dimensions, adding up to a total of 1,800 individual                      early on. In addition to CoClustering, KNNBasic, and KN-
trained prediction models, which were used in 3,600 exper-                      NWithMeans, also SlopeOne changes to a high rank quickly.
iments. In the following, we present the results of these                       Using the similarity to select the target user leads to the
evaluations.                                                                    highest ranks for the majority of algorithms. All algorithms
4.1   Push Attack Evaluation                                                    except for NMF and SVD were attacked successfully.
   For this evaluation, we analyze the rank of items in the
recommendation list for each step. Here, the best achiev-                       4.2      Pull Attack Evaluation
able rank in the ranking is zero. As stated in Section 3, we                       In the following, we evaluate pull attacks, where an at-
consider a push attack successful if the attack changes the                     tack is considered successful if the rank of the target item is
rank prediction of the target item to be in the top-10 of the                   consistently lowered by 10 or more ranks after an attack.
average item ranking.                                                              For the BASIC attack approach, a fresh user is used as the
   The first set of experiments investigate the BASIC attack                    auxiliary user. As the target item, the movie with the high-
strategy, where a new user is created as the auxiliary user                     est id from the set of movies the target user has not rated was
and the auxiliary item was randomly selected from the set                       selected. Figure 2a shows the obtained results. SlopeOne
of movies that have not been rated by the target user.                          is mostly unaffected by the pull attack. The effect on SVD
   Figure 1a shows that the average rank for CoClustering is                    was rather small and shows minor fluctuations. The average
influenced most as the curve races directly to the top ranks                    rank for KNNWithMeans moves slowly and converges early
within the first few steps. The results for NMF and SVD                         around 110. CoClustering, KNNBasic, and NMF show high
show capricious behavior within a confined area, but do not                     vulnerability against the attacks. Most interesting is the be-
reach a high rank. The ranks for KNNBasic and KNNWith-                          havior of NMF showing a clear tendency towards the lower
Means show that they are only influenced at the start but                       ranks when compared to SVD, the other MF-based algo-
converge quickly. SlopeOne is unaffected by the attack with                     rithm, whose behavior showed to be stable and unaffected
no changes in the rank. For NMF and SVD, the attacks can                        by the attacks.
be regarded as failed, but the ranking values have a high                          The next set of experiments analyzes the activity-based
variance, which goes in the direction of sabotage, as intro-                    approach with pull attacks. Figure 2b shows that NMF
duced in Section 2. Only the attack on CoClustering was                         and SVD are only marginally affected as both algorithms
successful, showing a high attackability of the basic attack                    show only minor fluctuations. CoClustering, KNNBasic,
strategy.                                                                       KNNWithMeans, and SlopeOne show a clear tendency to-
   The second set of experiments aims at investigating the                      wards the lower ranks.
user activity attack strategy; Figure 1b shows the obtained                        The last set of experiments used the similarity-based ap-
results. We can see that the values for all algorithms rush                     proach to launch pull attacks on the algorithms. In Fig-
towards the top at the start. CoClustering, KNNBasic, and                       ure 2c, we observe that the results are quite similar to the
                                                               Basic                                                                                        User Activity                                                                                      User Similarity
                                                    Rank-Step-Graph Push Attack                                                                     Rank-Step-Graph Push Attack                                                                         Rank-Step-Graph Push Attack
                                                                             CoClustering                                                                                        CoClustering                                                                                    CoClustering
                                     1000                                    KNNBasic                                                                                            KNNBasic                                                                                        KNNBasic
                                                                             KNNWithMeans                                                                                        KNNWithMeans                                                                                    KNNWithMeans
                                                                             NMF                                                       1000                                      NMF                                                       1000                                  NMF
                                                                             SVD                                                                                                 SVD                                                                                             SVD
       Average Rank of target-item


                                                                                                         Average Rank of target-item


                                                                                                                                                                                                             Average Rank of target-item
                                      800                                    SlopeOne                                                                                            SlopeOne                                                                                        SlopeOne
                                                                                                                                        800                                                                                                 800

                                      600
                                                                                                                                        600                                                                                                 600

                                      400                                                                                               400                                                                                                 400

                                      200                                                                                               200                                                                                                 200

                                        00      20         40           60      80          100                                           00   20          40           60          80          100                                           00    20         40           60      80          100
                                                                Steps                                                                                           Steps                                                                                               Steps

                                                (a) Basic attack approach.                                                                      (b) User activity approach.                                                                        (c) User similarity approach.


                                                                              Figure 1: Push attacks: average rank for the different co-rating attack strategies.


                                                           Basic                                                                                       User Activity                                                                                       User Similarity
                                                 Rank-Step-Graph Pull Attack                                                                    Rank-Step-Graph Pull Attack                                                                          Rank-Step-Graph Pull Attack
                              1200                                                                                                                                                                                                  1400
                                                                                                                                1400
                                                                                                                                                                                                                                    1200
                              1000                                                                                              1200
Average Rank of target-item


                                                                                                  Average Rank of target-item


                                                                                                                                                                                                      Average Rank of target-item
                                                                                                                                                                                                                                    1000
                                     800                                     CoClustering                                       1000                                             CoClustering                                                                                    CoClustering
                                                                             KNNBasic                                                                                            KNNBasic                                                                                        KNNBasic
                                                                             KNNWithMeans                                              800                                       KNNWithMeans                                              800                                   KNNWithMeans
                                     600                                     NMF                                                                                                 NMF                                                                                             NMF
                                                                             SVD                                                                                                 SVD                                                                                             SVD
                                                                             SlopeOne                                                  600                                       SlopeOne                                                  600                                   SlopeOne
                                     400
                                                                                                                                       400                                                                                                 400
                                     200                                                                                               200                                                                                                 200

                                       00      20         40            60     80       100                                              00    20         40           60          80       100                                              00    20         40           60      80       100
                                                                Steps                                                                                          Steps                                                                                               Steps

                                                (a) Basic attack strategy.                                                                     (b) User activity strategy.                                                                         (c) User similarity approach.


                                                                   Figure 2: Pull attacks: ranks in the average ranking for the different co-rating attack strategies.


         outcome of the experiment with the activity-based approach.                                                                                                        attack strategy and attack purpose, hence describing the at-
         SVD and NMF only show single spikes in the change of the                                                                                                           tackability of the algorithm against co-rating attacks.
         average rank. For the remaining algorithms, the average
         rank of the target item changes rapidly to a lower rank.                                                                                                                             Push attacks                                                                  Pull attacks
                                                                                                                                                                             Algorithm    BASIC ACT SIM                                                               BASIC ACT SIM
         4.3                                 Attackability Evaluation                                                                                                        CoClustering  7.14   7.14   7.14                                                          10.43 10.51 10.52
           Table 3 shows the attackability evaluation measured by                                                                                                            KNNBasic      2.78   9.57   9.69                                                           4.29 15.24 14.95
         the ∆rank measure (cf. Section 3.4) for the data from our                                                                                                           KNNWithMeans 1.15    9.45   9.45                                                           1.11 15.42 10.78
         experiments for push and pull attacks, and each of the three                                                                                                        NMF           4.09   7.40 10.50                                                           11.79 0.01      0.01
         attack strategies used.                                                                                                                                             SVD           3.50   5.91   6.13                                                           0.23    0.03   0.02
           For push attacks, we observe a relation between the at-                                                                                                           SlopeOne      0.06   7.74   9.27                                                           0.00 11.48 7.92
         tackability, i.e., if an algorithm could be attacked success-
                                                                                                                                                                               Table 3: Attackability for push and pull attacks, where brighter
         fully, and the average change in average rank for the cor-                                                                                                          colors represent a lower change in average ∆ rank and darker colors
         responding algorithms. The relation also holds for a lower                                                                                                                        signal a higher change in average ∆ rank.
         observed attackability, i.e., whether an algorithm could not
         be attacked successfully, and lower values for the average
         change in average rank. For pull attacks, we also see the
         same relation between our observed attackability and the                                                                                                           5.       CONCLUSION
         calculated average change in average rank. Even though the                                                                                                            We investigated the attackability of recommendation algo-
         push attacks on SVD and NMF were mostly considered un-                                                                                                             rithms against co-rating attacks. We ran experiments with
         successful, the average change in average rank is still higher                                                                                                     a varying number of co-ratings, approaches, and types of
         due to the erratic behavior the algorithms showed.                                                                                                                 attack for the chosen algorithms. This added up to a total
           From this, we can state that our newly proposed metric,                                                                                                          of 1,800 prediction models, which were used in 3,600 ex-
         the average change in average rank, is an intuitive and feasi-                                                                                                     periments. In these experiments, we investigated the effort,
         ble indicator for how vulnerable an algorithm is to a specific                                                                                                     the number of co-ratings, and the effect of the attacks on
the results of the algorithm, where we also introduce a new         Computer, 42(8):30–37, 2009.
metric to express the attackability from the collected data,   [11] S. K. Lam and J. Riedl. Shilling Recommender
by calculating the average change in average rank over all          Systems for Fun and Profit. In Proceedings of the 13th
steps. We find that the nearest neighbor and co-clustering          International Conference on World Wide Web, WWW
algorithms were the least resilient algorithms. Furthermore,        ’04, pages 393–402, New York, NY, USA, 2004. ACM.
we observe that the matrix factorization-based algorithms      [12] S. K. T. Lam, D. Frankowski, and J. Riedl. Do You
show erratic behavior when attacked with push attacks, but          Trust Your Recommendations? An Exploration of
generally showed a lower attackability when assaulted with          Security and Privacy Issues in Recommender Systems.
pull attacks. Our results show that a handful of coordinated        In G. Müller, editor, Emerging Trends in Information
users are enough to manipulate the outcome of recommen-             and Comm. Secur., pages 14–29. Springer, 2006.
dation algorithms, thus having an impact on which products     [13] D. Lemire and A. Maclachlan. Slope One Predictors
we may buy, movies we watch, or locations we visit for our          for Online Rating-Based Collaborative Filtering. In
next holiday. For future work, we aim to investigate the in-        Proceedings of the 2005 SIAM Int. Conference on
fluence of the number of ratings when choosing target-user          Data Mining, pages 471–475. SIAM, 2005.
and -movie, and auxiliary user. Furthermore, we aim to test    [14] X. Luo, M. Zhou, Y. Xia, and Q. Zhu. An Efficient
the generalizability of our approach and metric by running          Non-Negative Matrix-Factorization-Based Approach
the experiments using different target-users and -items, and        to Collaborative Filtering for Recommender Systems.
on different and larger datasets.                                   IEEE Transactions on Industrial Informatics,
                                                                    10(2):1273–1284, 2014.
6.   REFERENCES                                                [15] B. Mobasher, R. Burke, R. Bhaumik, and C. Williams.
 [1] F. Aiolli, M. Conti, S. Picek, and M. Polato. Big              Toward Trustworthy Recommender Systems: An
     Enough to Care Not Enough to Scare! Crawling to                Analysis of Attack Models and Algorithm Robustness.
     Attack Recommender Systems. In European                        ACM Trans. Internet Technol., 7(4):23–es, Oct. 2007.
     Symposium on Research in Computer Security, pages         [16] B. Mobasher, R. Burke, C. Williams, and
     165–184. Springer, 2020.                                       R. Bhaumik. Analysis and Detection of
 [2] D. Bollen, B. P. Knijnenburg, M. C. Willemsen, and             Segment-Focused Attacks Against Collaborative
     M. Graus. Understanding Choice Overload in                     Recommendation. In O. Nasraoui, O. Zaı̈ane,
     Recommender Systems. In Proceedings of the Fourth              M. Spiliopoulou, B. Mobasher, B. Masand, and P. S.
     ACM Conference on Recommender Systems, RecSys                  Yu, editors, Advances in Web Mining and Web Usage
     ’10, page 63–70, New York, NY, USA, 2010.                      Analysis, pages 96–118. Springer, 2006.
     Association for Computing Machinery.                      [17] X. Ning, C. Desrosiers, and G. Karypis. A
 [3] R. Burke, M. P. O’Mahony, and N. J. Hurley. Robust             Comprehensive Survey of Neighborhood-Based
     Collaborative Recommendation. In F. Ricci,                     Recommendation Methods. In F. Ricci, L. Rokach,
     L. Rokach, and B. Shapira, editors, Recommender                and B. Shapira, editors, Recommender Systems
     Systems Handbook, pages 961–995. Springer, 2015.               Handbook, pages 37–76. Springer, 2015.
 [4] K. Chen, P. P. Chan, F. Zhang, and Q. Li. Shilling        [18] M. O’Mahony, N. Hurley, N. Kushmerick, and
     Attack based on Item Popularity and Rated Item                 G. Silvestre. Collaborative Recommendation: A
     Correlation against Collaborative Filtering.                   Robustness Analysis. ACM Trans. Internet Technol.,
     International Journal of Machine Learning and                  4(4):344–377, Nov. 2004.
     Cybernetics, 10(7):1833–1845, 2019.                       [19] M. P. O’Mahony, N. J. Hurley, and G. C. Silvestre.
 [5] P. Covington, J. Adams, and E. Sargin. Deep Neural             Recommender systems: Attack Types and Strategies.
     Networks for YouTube Recommendations. In                       In AAAI, pages 334–339, 2005.
     Proceedings of the 10th ACM Conference on                 [20] M. P. O’Mahony, N. J. Hurley, and G. C. Silvestre.
     Recommender Systems, RecSys ’16, page 191–198,                 Attacking Recommender Systems: The Cost of
     New York, NY, USA, 2016. Association for                       Promotion. In Proc. of the Workshop on Recommender
     Computing Machinery.                                           Systems, in Conjunction with the 17th Eur. Conf. on
 [6] T. George and S. Merugu. A scalable Collaborative              Artif. Intell., Riva del Garda, Italy, pages 24–28, 2006.
     Filtering Framework based on Co-Clustering. In Fifth      [21] M. Pichl, E. Zangerle, and G. Specht. Improving
     IEEE International Conference on Data Mining                   Context-Aware Music Recommender Systems: Beyond
     (ICDM’05), pages 4–pp. IEEE, 2005.                             the Pre-Filtering Approach. In Proceedings of the 2017
 [7] C. A. Gomez-Uribe and N. Hunt. The Netflix                     ACM on International Conference on Multimedia
     Recommender System: Algorithms, Business Value,                Retrieval, ICMR ’17, page 201–208, New York, NY,
     and Innovation. ACM Transactions on Management                 USA, 2017. Association for Computing Machinery.
     Information Systems, 6(4), Dec. 2016.                     [22] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen.
 [8] F. M. Harper and J. A. Konstan. The MovieLens                  Collaborative Filtering Recommender Systems. In The
     Datasets: History and Context. ACM Trans. Interact.            Adaptive Web, pages 291–324. Springer, 2007.
     Intell. Syst. (TiiS), 5(4):19:1–19:19, Dec. 2015.         [23] M. Schedl, P. Knees, B. McFee, D. Bogdanov, and
 [9] D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich.         M. Kaminskas. Music Recommender Systems. In
     Recommender Systems - An Introduction. Cambridge               F. Ricci, L. Rokach, and B. Shapira, editors,
     University Press, 2010.                                        Recommender Systems Handbook, pages 453–492.
[10] Y. Koren, R. Bell, and C. Volinsky. Matrix                     Springer, 2015.
     Factorization Techniques for Recommender Systems.