=Paper= {{Paper |id=Vol-3924/short2 |storemode=property |title=Removing Bad Influence: Identifying and Pruning Detrimental Users in Collaborative Filtering Recommender Systems |pdfUrl=https://ceur-ws.org/Vol-3924/short2.pdf |volume=Vol-3924 |authors=Philipp Meister,Lukas Wegmeth,Tobias Vente,Joeran Beel |dblpUrl=https://dblp.org/rec/conf/robustrecsys/MeisterWVB24 }} ==Removing Bad Influence: Identifying and Pruning Detrimental Users in Collaborative Filtering Recommender Systems== https://ceur-ws.org/Vol-3924/short2.pdf
                         Removing Bad Influence: Identifying and Pruning Detrimental
                         Users in Collaborative Filtering Recommender Systems
                         Philipp Meister1 , Lukas Wegmeth1 , Tobias Vente1 and Joeran Beel1
                         1
                             Intelligent Systems Group, University of Siegen, Germany


                                            Abstract
                                            Recommender systems commonly employ Collaborative Filtering to generate personalized recommendations, forming an implicit social
                                            network where users influence each other’s recommendations based on their preferences. In this paper, we show that it is possible
                                            to identify users with detrimental influence—those who negatively affect the recommendations of others—and that merely removing
                                            specific detrimental users from the training data can improve system performance. We apply a Leave-one-out analysis across five
                                            datasets to capture how recommendations change if a specific user is removed. Based on that data, we quantify positive and negative
                                            influences and implement a pruning strategy to remove detrimental users. Importantly, our strategy still provides recommendations
                                            to the pruned users by recommending the most popular items. We evaluate our pruning strategy on five commonly used datasets,
                                            including MovieLens, Amazon, and LastFM. We show that pruning detrimental users increases kNN performance, achieving an average
                                            performance increase of 3% for Item-Item kNN while removing 3.56% of users from the training data. Our findings highlight the
                                            potential of influence-based pruning to enhance recommender systems by increasing performance and creating resilience against
                                            detrimental influence.

                                            Keywords
                                            Collaborative Filtering, Recommender Systems, User Influence, User Pruning



                         1. Introduction                                                                                               or all detrimental users should improve recommendations.
                                                                                                                                       However, it is a common assumption that less training data
                         Collaborative Filtering (CF) algorithms make recommenda-                                                      decreases performance [5], so removing many detrimental
                         tions based on the principle that users who agree on the                                                      users may hurt rather than increase performance. To inves-
                         same items will do so in the future. The result is a system                                                   tigate whether detrimental users exist and how removing
                         in which each user’s recommendations are primarily de-                                                        them affects recommendations, we examine the following
                         termined by their similarity to other users. Due to users’                                                    research question:
                         influence on each other, Lathia et al. [1] interpret kNN CF
                         recommender systems as implicit social networks. Much                                                                     RQ: How can we identify and separate detrimental
                         like in a social network, users’ influence varies widely, re-                                                             users?
                         sulting in a few users who significantly impact the overall
                                                                                                                                       To answer our research question, we analyze the influence
                         system’s behavior [2]. For our analysis, we define a user’s
                                                                                                                                       of every individual user in five popular datasets using three
                         influence as their ability to change other users’ recommen-
                                                                                                                                       CF kNN and matrix factorization algorithms to quantify
                         dations with their ratings.
                                                                                                                                       user influence on ranking predictions. We show that it is
                            In the quest to increase the performance and robustness
                                                                                                                                       possible to identify detrimental users who negatively impact
                         of recommender systems, those influential users [3] are an
                                                                                                                                       the performance of other users and that pruning the most
                         important asset for recommender system engineers. For
                                                                                                                                       detrimental users can improve overall recommendations.
                         example, influential users can improve recommendations by
                                                                                                                                          The source code reproducing all the results presented in
                         rating new items, thereby addressing the cold start problem
                                                                                                                                       this paper is available at our GitHub1 .
                         for those items [4]. On the other hand, bad actors, e.g.,
                         users with fake profiles or users who inject fake ratings to
                         push certain items, could exploit the power of influential                                                    2. Related Work
                         users and hurt recommendations. Wilson et al. [3] find that
                         depending on the dataset and algorithm, using just 1% of                                                      Several aspects of user influence in recommender systems
                         users for such an attack results in significant performance                                                   have been the subject of previous research. Rashid et al.
                         reductions.                                                                                                   [6] propose a general approach to determine user influence
                            The potential of influential users to either enhance or                                                    in rating-based recommender systems and analyze User-
                         erode the quality of recommendations leads us to an intrigu-                                                  User and Item-Item CF systems using the Hide-one-User or
                         ing possibility. We hypothesize that there are real users                                                     Leave-one-out method. They discover correlations between
                         in the training data whose inclusion negatively impacts                                                       user influence and several simple heuristics and use them
                         the recommendations for other users and that removing                                                         to create a regression model to estimate user influence. The
                         these users can improve overall system performance. We                                                        model predictions have a squared correlation coefficient of
                         call them detrimental users. Intuitively, removing many                                                       0.94 for User-User and 0.99 for Item-Item, indicating that
                                                                                                                                       simple heuristics can estimate influence. Morid et al. [7]
                          RobustRecSys: Design, Evaluation, and Deployment of Robust Recom-                                            discover similar influence heuristics employing the same
                          mender Systems Workshop @ RecSys 2024, 18 October, 2024, Bari, Italy.                                        approach as Rashid et al. [8]. However, neither approach
                          $ philipp.meister@student.uni-siegen.de (P. Meister);                                                        distinguishes between positive and detrimental influence.
                          lukas.wegmeth@uni-siegen.de (L. Wegmeth);
                          tobias.vente@uni-siegen.de (T. Vente); joeran.beel@uni-siegen.de
                                                                                                                                          In more recent work, Eskandanian et al. [2] study influ-
                          (J. Beel)                                                                                                    ential users in CF systems across different domains. Their
                           0009-0008-6814-9668 (P. Meister); 0000-0001-8848-9434                                                      analysis shows that the effect of influence is generally more
                          (L. Wegmeth); 0009-0003-8881-2379 (T. Vente); 0000-0002-4537-5573                                            substantial in matrix factorization systems compared to
                          (J. Beel)
                                        © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License   1
                                        Attribution 4.0 International (CC BY 4.0).                                                         https://code.isg.beel.org/influence-pruning/


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
kNN, and several factors, including centrality, number of           a user 𝑖. Then, we prune the user 𝑖 from the training and
ratings, and similarity to the average user, can identify in-       test data. We train a new model and calculate the recom-
fluential users. Furthermore, they find that the effect of          mendations on this pruned data. These results form a vector
influence depends on the parameters like features for ma-           r𝑖 for each pruned user 𝑖, with 𝑟𝑖,𝑗 being the NDCG@10 for
trix factorization. Like Rashid et al. [6, 8], the influence        user 𝑗 calculated without user 𝑖 in the training data. The
discrimination model used for the underlying analysis does          NDCG@10 of the pruned user 𝑖 is set to 0 because the user
not account for positive or negative influence, making it           receives no recommendations. The basis of the influence
impossible to identify detrimental influential users.               analysis is the difference between the pruned results r and
   Wilson et al. [3] discover that it is possible to hurt CF        baseline results b. For every user 𝑖, a vector ∆r𝑖 = r𝑖 − b
kNN recommender systems by conducting a targeted power              describes this difference. Pruning successively every user 𝑖
user attack. In contrast, Seminario et al. [9] study the same       results in a Matrix 𝑅 with
for matrix factorization CF. In this context, power users
are synonymous with our definition of influential users.                                𝑅 = (∆r𝑖 )𝑖=1,...,𝑛
They find that when power users are injected with biased
ratings for new items, the MAE for User-User CF rises by            where 𝑛 is the total number of users in the dataset. If for
up to 3% on the MovieLens 1M dataset [3]. This could imply          users 𝑖 and 𝑗 ∆𝑟𝑖,𝑗 > 0 holds, user 𝑗 receives better rec-
that influential users harm recommendations depending on            ommendations when user 𝑖 is not in the training data, ergo
their rating profile. Additionally, their results show that         user 𝑖 has a detrimental influence on user 𝑗. Conversely,
Item-Item kNN is less vulnerable to power user attacks than         if ∆𝑟𝑖,𝑗 < 0 holds, the existence of user 𝑖 in the data im-
User-User kNN and matrix factorization.                             proves user 𝑗’s recommendations. Using 𝑅, we calculate
   Existing research indicates that just a few influential users    the following four normalized influence metrics.
have the potential to considerably change recommenda-                  The influence mean 𝜇 is the difference between the
tions [6, 3, 9], both positively and negatively. We expand          baseline and pruned mean NDCG@10. It describes how the
on previous research by using implicit feedback data, e.g.,         overall system performance changed compared to the base-
unweighted user interactions, and interpreting influence as         line performance due to pruning the user 𝑖. A feature of 𝜇𝑖
a multi-dimensional metric. Furthermore, we distinguish             is that it depends on user 𝑖’s baseline performance because
between positive and detrimental influence and study how            𝑖’s performance on the pruned dataset is 0. To address this
detrimental users can be identified and how pruning them            issue, we introduce the cleaned influence mean 𝛾, which
affects recommender system performance.                             removes the pruned user 𝑖 from the influence mean calcula-
                                                                    tions. Furthermore, we introduce the influence difference
                                                                    𝛿, which we derive from the 𝑁 𝑃 𝐷 metric presented by
3. Method                                                           Rashid et al. [6]. It calculates the difference between the
                                                                    number of users influenced positively and negatively by user
We examine detrimental users and their effect on recommen-          𝑖. Finally, the influence score 𝛼 accounts for the cleaned
dations in two parts. The first consists of a user influence        influence mean and the influence difference by calculating
analysis, which aims to identify detrimental influential users      their difference with 𝛼 = 𝛿 − 𝛾.
by quantifying influence via different metrics. To achieve             To test whether pruning multiple detrimental users from
this, we adopt the Leave-one-out (LOO) concept described            the training data based on the acquired influence data im-
by Rashid et al. [6] and developed a pipeline to capture influ-     proves recommendations, we evaluate an optimal pruning
ence data for every user. In the second part of the analysis,       strategy on all datasets and algorithms using user-based
we use the obtained influence data to study how pruning             five-fold cross-validation. We use random search to identify
users from the training data based on different influence           the optimal pruning threshold of each influence metric for
metrics changes performance. One important remaining                each dataset and algorithm and prune users based on this
issue is that pruning removes valuable training data and            optimal pruning strategy.
disregards pruned users, which Beel et al. [10] identified
as a widespread problem in recommender system research.
To avoid that, we do not simply remove pruned users but             4. Results & Discussion
instead calculate their recommendations separately, recom-
mending the most popular items.                                     The result of pruning detrimental users is illustrated in
   We use five datasets in our analysis: ML-100k, ML-1M             Figure 1. Item-Item kNN benefits the most with an av-
[11], Last.FM [12], Amazon-Digital-Music and Amazon-                erage performance increase of 3% while User-User kNN
Luxury-And-Beauty [13]. We transform explicit feedback              also shows improvements but on a lower level with around
data, e.g., ratings, into implicit feedback data, e.g., inter-      0.2%. ALS is, on average, negatively affected by pruning
actions, treating every rating as a positive interaction and        detrimental users. The relative performance change for
evaluate the NDCG@10 of User-User kNN, Item-Item kNN,               the users remaining in the training data is, on average,
and Alternating Least Squares (ALS) CF on each dataset. We          around 0.5 percentage points better than all users com-
use the algorithm implementations of LensKit [14].                  bined. This is expected since the pruned users are recom-
   The first part of our analysis follows a simple question:        mended the most popular items, which are worse than CF.
if one specific user is removed from the training data, how
does the NDCG@10 change for every other user? To answer
this question, we implement the following LOO pipeline.
First, we calculate the baseline result for each algorithm, e.g.,
the NDCG@10 performance of each algorithm considering
all users. From the obtained results, we build a vector b,
with each entry 𝑏𝑖 representing the baseline NDCG@10 for
          Figure 1: Aggregated performance change over all five datasets after pruning detrimental users with an optimal threshold
          and metric. Baseline and pruned results are calculated using five-fold cross-validation.



                                                                       To illustrate, Figure 2 shows the user distribution in the
               Algorithm        Pruned users (%)                    ML-1M dataset using Item-Item kNN. The dispersion of the
            User-User kNN             0.61%                         influence score increases with rising influence. This leads
                                                                    to some influential users significantly negatively affecting
            Item-Item kNN             3.56%
                                                                    other users. Pruning all users below the threshold shown
                  ALS                 0.44%                         in Figure 2 leads to a considerable overall performance im-
                                                                    provement of over 7% for ML-1M Item-Item kNN.
Table 1: The average percentage of users pruned from the               The results from our pruning analysis answer our initial
         training data for all tested algorithms.                   research question by confirming that the influence metrics
                                                                    we introduce can identify and differentiate users with posi-
   The amount of pruned users varies significantly depend-          tive and detrimental influences. The performance improve-
ing on the dataset. Table 1 shows an average, with over             ment we observe, especially for Item-Item kNN, demon-
3.5% of users pruned for Item-Item kNN, confirming that             strates that the users we identify as detrimental harm the
removing multiple detrimental users can improve recom-              recommendations of other users. However, this effect de-
mendations despite reducing the training data. We observe           pends on the algorithm, as shown by the reduced perfor-
that the optimal influence metric and threshold depend on           mance of ALS when pruning users. Future work should
the dataset and algorithm. The performance increase in our          focus on understanding the characteristics of detrimental
experiments varies depending on the dataset, with larger            users and try to identify them based on heuristics with-
datasets benefitting more. For example, we observe a three          out the need to conduct a computationally expensive LOO
times higher relative performance improvement for ML-1M             analysis.
than ML-100K.

                                                                    References
                                                                     [1] N. Lathia, S. Hailes, L. Capra, knn cf: a temporal social
                                                                         network, in: Proceedings of the 2008 ACM conference
                                                                         on Recommender systems, 2008, pp. 227–234.
                                                                     [2] F. Eskandanian, N. Sonboli, B. Mobasher, Power of
                                                                         the few: Analyzing the impact of influential users in
                                                                         collaborative recommender systems, in: Proceedings
                                                                         of the 27th ACM Conference on User Modeling, Adap-
                                                                         tation and Personalization, 2019, pp. 225–233.
                                                                     [3] D. C. Wilson, C. E. Seminario, When power users at-
                                                                         tack: assessing impacts in collaborative recommender
                                                                         systems, in: Proceedings of the 7th ACM conference
                                                                         on Recommender systems, 2013, pp. 427–430.
                                                                     [4] S. S. Anand, N. Griffiths, A market-based approach
                                                                         to address the new item problem, in: Proceedings of
                                                                         the fifth ACM conference on Recommender systems,
                                                                         2011, pp. 205–212.
                                                                     [5] G. Adomavicius, J. Zhang, Impact of data character-
                                                                         istics on recommender systems performance, ACM
Figure 2: Correlation between the influence score of a user and          Transactions on Management Information Systems
the total number of other users the user influenced. The dashed          (TMIS) 3 (2012) 1–17.
line represents the threshold below which we remove all users        [6] A. M. Rashid, G. Karypis, J. Riedl, Influence in
from the training data.                                                  ratings-based recommender systems: An algorithm-
                                                                         independent approach, in: Proceedings of the 2005
     SIAM International Conference on Data Mining, SIAM,
     2005, pp. 556–560.
 [7] M. A. Morid, M. Shajari, A. H. Golpayegani, Who are
     the most influential users in a recommender system?,
     in: Proceedings of the 13th international conference
     on electronic commerce, 2011, pp. 1–5.
 [8] A. M. Rashid, Mining influence in recommender sys-
     tems, 2007.
 [9] C. E. Seminario, D. C. Wilson, Assessing impacts of
     a power user attack on a matrix factorization collabo-
     rative recommender system, in: The Twenty-Seventh
     International Flairs Conference, 2014.
[10] J. Beel, V. Brunel, Data pruning in recommender sys-
     tems research: Best-practice or malpractice, ACM
     RecSys (2019).
[11] F. M. Harper, J. A. Konstan, The movielens datasets:
     History and context, Acm transactions on interactive
     intelligent systems (tiis) 5 (2015) 1–19.
[12] I. Cantador, P. Brusilovsky, T. Kuflik, 2nd workshop on
     information heterogeneity and fusion in recommender
     systems (hetrec 2011), in: Proceedings of the 5th ACM
     conference on Recommender systems, RecSys 2011,
     ACM, New York, NY, USA, 2011.
[13] J. Ni, J. Li, J. McAuley, Justifying recommendations us-
     ing distantly-labeled reviews and fine-grained aspects,
     in: Proceedings of the 2019 conference on empirical
     methods in natural language processing and the 9th
     international joint conference on natural language
     processing (EMNLP-IJCNLP), 2019, pp. 188–197.
[14] M. D. Ekstrand, Lenskit for python: Next-generation
     software for recommender systems experiments, in:
     Proceedings of the 29th ACM international conference
     on information & knowledge management, 2020, pp.
     2999–3006.