=Paper=
{{Paper
|id=Vol-3924/short2
|storemode=property
|title=Removing Bad Influence: Identifying and Pruning Detrimental Users in Collaborative Filtering Recommender Systems
|pdfUrl=https://ceur-ws.org/Vol-3924/short2.pdf
|volume=Vol-3924
|authors=Philipp Meister,Lukas Wegmeth,Tobias Vente,Joeran Beel
|dblpUrl=https://dblp.org/rec/conf/robustrecsys/MeisterWVB24
}}
==Removing Bad Influence: Identifying and Pruning Detrimental Users in Collaborative Filtering Recommender Systems==
Removing Bad Influence: Identifying and Pruning Detrimental
Users in Collaborative Filtering Recommender Systems
Philipp Meister1 , Lukas Wegmeth1 , Tobias Vente1 and Joeran Beel1
1
Intelligent Systems Group, University of Siegen, Germany
Abstract
Recommender systems commonly employ Collaborative Filtering to generate personalized recommendations, forming an implicit social
network where users influence each other’s recommendations based on their preferences. In this paper, we show that it is possible
to identify users with detrimental influence—those who negatively affect the recommendations of others—and that merely removing
specific detrimental users from the training data can improve system performance. We apply a Leave-one-out analysis across five
datasets to capture how recommendations change if a specific user is removed. Based on that data, we quantify positive and negative
influences and implement a pruning strategy to remove detrimental users. Importantly, our strategy still provides recommendations
to the pruned users by recommending the most popular items. We evaluate our pruning strategy on five commonly used datasets,
including MovieLens, Amazon, and LastFM. We show that pruning detrimental users increases kNN performance, achieving an average
performance increase of 3% for Item-Item kNN while removing 3.56% of users from the training data. Our findings highlight the
potential of influence-based pruning to enhance recommender systems by increasing performance and creating resilience against
detrimental influence.
Keywords
Collaborative Filtering, Recommender Systems, User Influence, User Pruning
1. Introduction or all detrimental users should improve recommendations.
However, it is a common assumption that less training data
Collaborative Filtering (CF) algorithms make recommenda- decreases performance [5], so removing many detrimental
tions based on the principle that users who agree on the users may hurt rather than increase performance. To inves-
same items will do so in the future. The result is a system tigate whether detrimental users exist and how removing
in which each user’s recommendations are primarily de- them affects recommendations, we examine the following
termined by their similarity to other users. Due to users’ research question:
influence on each other, Lathia et al. [1] interpret kNN CF
recommender systems as implicit social networks. Much RQ: How can we identify and separate detrimental
like in a social network, users’ influence varies widely, re- users?
sulting in a few users who significantly impact the overall
To answer our research question, we analyze the influence
system’s behavior [2]. For our analysis, we define a user’s
of every individual user in five popular datasets using three
influence as their ability to change other users’ recommen-
CF kNN and matrix factorization algorithms to quantify
dations with their ratings.
user influence on ranking predictions. We show that it is
In the quest to increase the performance and robustness
possible to identify detrimental users who negatively impact
of recommender systems, those influential users [3] are an
the performance of other users and that pruning the most
important asset for recommender system engineers. For
detrimental users can improve overall recommendations.
example, influential users can improve recommendations by
The source code reproducing all the results presented in
rating new items, thereby addressing the cold start problem
this paper is available at our GitHub1 .
for those items [4]. On the other hand, bad actors, e.g.,
users with fake profiles or users who inject fake ratings to
push certain items, could exploit the power of influential 2. Related Work
users and hurt recommendations. Wilson et al. [3] find that
depending on the dataset and algorithm, using just 1% of Several aspects of user influence in recommender systems
users for such an attack results in significant performance have been the subject of previous research. Rashid et al.
reductions. [6] propose a general approach to determine user influence
The potential of influential users to either enhance or in rating-based recommender systems and analyze User-
erode the quality of recommendations leads us to an intrigu- User and Item-Item CF systems using the Hide-one-User or
ing possibility. We hypothesize that there are real users Leave-one-out method. They discover correlations between
in the training data whose inclusion negatively impacts user influence and several simple heuristics and use them
the recommendations for other users and that removing to create a regression model to estimate user influence. The
these users can improve overall system performance. We model predictions have a squared correlation coefficient of
call them detrimental users. Intuitively, removing many 0.94 for User-User and 0.99 for Item-Item, indicating that
simple heuristics can estimate influence. Morid et al. [7]
RobustRecSys: Design, Evaluation, and Deployment of Robust Recom- discover similar influence heuristics employing the same
mender Systems Workshop @ RecSys 2024, 18 October, 2024, Bari, Italy. approach as Rashid et al. [8]. However, neither approach
$ philipp.meister@student.uni-siegen.de (P. Meister); distinguishes between positive and detrimental influence.
lukas.wegmeth@uni-siegen.de (L. Wegmeth);
tobias.vente@uni-siegen.de (T. Vente); joeran.beel@uni-siegen.de
In more recent work, Eskandanian et al. [2] study influ-
(J. Beel) ential users in CF systems across different domains. Their
0009-0008-6814-9668 (P. Meister); 0000-0001-8848-9434 analysis shows that the effect of influence is generally more
(L. Wegmeth); 0009-0003-8881-2379 (T. Vente); 0000-0002-4537-5573 substantial in matrix factorization systems compared to
(J. Beel)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 1
Attribution 4.0 International (CC BY 4.0). https://code.isg.beel.org/influence-pruning/
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
kNN, and several factors, including centrality, number of a user 𝑖. Then, we prune the user 𝑖 from the training and
ratings, and similarity to the average user, can identify in- test data. We train a new model and calculate the recom-
fluential users. Furthermore, they find that the effect of mendations on this pruned data. These results form a vector
influence depends on the parameters like features for ma- r𝑖 for each pruned user 𝑖, with 𝑟𝑖,𝑗 being the NDCG@10 for
trix factorization. Like Rashid et al. [6, 8], the influence user 𝑗 calculated without user 𝑖 in the training data. The
discrimination model used for the underlying analysis does NDCG@10 of the pruned user 𝑖 is set to 0 because the user
not account for positive or negative influence, making it receives no recommendations. The basis of the influence
impossible to identify detrimental influential users. analysis is the difference between the pruned results r and
Wilson et al. [3] discover that it is possible to hurt CF baseline results b. For every user 𝑖, a vector ∆r𝑖 = r𝑖 − b
kNN recommender systems by conducting a targeted power describes this difference. Pruning successively every user 𝑖
user attack. In contrast, Seminario et al. [9] study the same results in a Matrix 𝑅 with
for matrix factorization CF. In this context, power users
are synonymous with our definition of influential users. 𝑅 = (∆r𝑖 )𝑖=1,...,𝑛
They find that when power users are injected with biased
ratings for new items, the MAE for User-User CF rises by where 𝑛 is the total number of users in the dataset. If for
up to 3% on the MovieLens 1M dataset [3]. This could imply users 𝑖 and 𝑗 ∆𝑟𝑖,𝑗 > 0 holds, user 𝑗 receives better rec-
that influential users harm recommendations depending on ommendations when user 𝑖 is not in the training data, ergo
their rating profile. Additionally, their results show that user 𝑖 has a detrimental influence on user 𝑗. Conversely,
Item-Item kNN is less vulnerable to power user attacks than if ∆𝑟𝑖,𝑗 < 0 holds, the existence of user 𝑖 in the data im-
User-User kNN and matrix factorization. proves user 𝑗’s recommendations. Using 𝑅, we calculate
Existing research indicates that just a few influential users the following four normalized influence metrics.
have the potential to considerably change recommenda- The influence mean 𝜇 is the difference between the
tions [6, 3, 9], both positively and negatively. We expand baseline and pruned mean NDCG@10. It describes how the
on previous research by using implicit feedback data, e.g., overall system performance changed compared to the base-
unweighted user interactions, and interpreting influence as line performance due to pruning the user 𝑖. A feature of 𝜇𝑖
a multi-dimensional metric. Furthermore, we distinguish is that it depends on user 𝑖’s baseline performance because
between positive and detrimental influence and study how 𝑖’s performance on the pruned dataset is 0. To address this
detrimental users can be identified and how pruning them issue, we introduce the cleaned influence mean 𝛾, which
affects recommender system performance. removes the pruned user 𝑖 from the influence mean calcula-
tions. Furthermore, we introduce the influence difference
𝛿, which we derive from the 𝑁 𝑃 𝐷 metric presented by
3. Method Rashid et al. [6]. It calculates the difference between the
number of users influenced positively and negatively by user
We examine detrimental users and their effect on recommen- 𝑖. Finally, the influence score 𝛼 accounts for the cleaned
dations in two parts. The first consists of a user influence influence mean and the influence difference by calculating
analysis, which aims to identify detrimental influential users their difference with 𝛼 = 𝛿 − 𝛾.
by quantifying influence via different metrics. To achieve To test whether pruning multiple detrimental users from
this, we adopt the Leave-one-out (LOO) concept described the training data based on the acquired influence data im-
by Rashid et al. [6] and developed a pipeline to capture influ- proves recommendations, we evaluate an optimal pruning
ence data for every user. In the second part of the analysis, strategy on all datasets and algorithms using user-based
we use the obtained influence data to study how pruning five-fold cross-validation. We use random search to identify
users from the training data based on different influence the optimal pruning threshold of each influence metric for
metrics changes performance. One important remaining each dataset and algorithm and prune users based on this
issue is that pruning removes valuable training data and optimal pruning strategy.
disregards pruned users, which Beel et al. [10] identified
as a widespread problem in recommender system research.
To avoid that, we do not simply remove pruned users but 4. Results & Discussion
instead calculate their recommendations separately, recom-
mending the most popular items. The result of pruning detrimental users is illustrated in
We use five datasets in our analysis: ML-100k, ML-1M Figure 1. Item-Item kNN benefits the most with an av-
[11], Last.FM [12], Amazon-Digital-Music and Amazon- erage performance increase of 3% while User-User kNN
Luxury-And-Beauty [13]. We transform explicit feedback also shows improvements but on a lower level with around
data, e.g., ratings, into implicit feedback data, e.g., inter- 0.2%. ALS is, on average, negatively affected by pruning
actions, treating every rating as a positive interaction and detrimental users. The relative performance change for
evaluate the NDCG@10 of User-User kNN, Item-Item kNN, the users remaining in the training data is, on average,
and Alternating Least Squares (ALS) CF on each dataset. We around 0.5 percentage points better than all users com-
use the algorithm implementations of LensKit [14]. bined. This is expected since the pruned users are recom-
The first part of our analysis follows a simple question: mended the most popular items, which are worse than CF.
if one specific user is removed from the training data, how
does the NDCG@10 change for every other user? To answer
this question, we implement the following LOO pipeline.
First, we calculate the baseline result for each algorithm, e.g.,
the NDCG@10 performance of each algorithm considering
all users. From the obtained results, we build a vector b,
with each entry 𝑏𝑖 representing the baseline NDCG@10 for
Figure 1: Aggregated performance change over all five datasets after pruning detrimental users with an optimal threshold
and metric. Baseline and pruned results are calculated using five-fold cross-validation.
To illustrate, Figure 2 shows the user distribution in the
Algorithm Pruned users (%) ML-1M dataset using Item-Item kNN. The dispersion of the
User-User kNN 0.61% influence score increases with rising influence. This leads
to some influential users significantly negatively affecting
Item-Item kNN 3.56%
other users. Pruning all users below the threshold shown
ALS 0.44% in Figure 2 leads to a considerable overall performance im-
provement of over 7% for ML-1M Item-Item kNN.
Table 1: The average percentage of users pruned from the The results from our pruning analysis answer our initial
training data for all tested algorithms. research question by confirming that the influence metrics
we introduce can identify and differentiate users with posi-
The amount of pruned users varies significantly depend- tive and detrimental influences. The performance improve-
ing on the dataset. Table 1 shows an average, with over ment we observe, especially for Item-Item kNN, demon-
3.5% of users pruned for Item-Item kNN, confirming that strates that the users we identify as detrimental harm the
removing multiple detrimental users can improve recom- recommendations of other users. However, this effect de-
mendations despite reducing the training data. We observe pends on the algorithm, as shown by the reduced perfor-
that the optimal influence metric and threshold depend on mance of ALS when pruning users. Future work should
the dataset and algorithm. The performance increase in our focus on understanding the characteristics of detrimental
experiments varies depending on the dataset, with larger users and try to identify them based on heuristics with-
datasets benefitting more. For example, we observe a three out the need to conduct a computationally expensive LOO
times higher relative performance improvement for ML-1M analysis.
than ML-100K.
References
[1] N. Lathia, S. Hailes, L. Capra, knn cf: a temporal social
network, in: Proceedings of the 2008 ACM conference
on Recommender systems, 2008, pp. 227–234.
[2] F. Eskandanian, N. Sonboli, B. Mobasher, Power of
the few: Analyzing the impact of influential users in
collaborative recommender systems, in: Proceedings
of the 27th ACM Conference on User Modeling, Adap-
tation and Personalization, 2019, pp. 225–233.
[3] D. C. Wilson, C. E. Seminario, When power users at-
tack: assessing impacts in collaborative recommender
systems, in: Proceedings of the 7th ACM conference
on Recommender systems, 2013, pp. 427–430.
[4] S. S. Anand, N. Griffiths, A market-based approach
to address the new item problem, in: Proceedings of
the fifth ACM conference on Recommender systems,
2011, pp. 205–212.
[5] G. Adomavicius, J. Zhang, Impact of data character-
istics on recommender systems performance, ACM
Figure 2: Correlation between the influence score of a user and Transactions on Management Information Systems
the total number of other users the user influenced. The dashed (TMIS) 3 (2012) 1–17.
line represents the threshold below which we remove all users [6] A. M. Rashid, G. Karypis, J. Riedl, Influence in
from the training data. ratings-based recommender systems: An algorithm-
independent approach, in: Proceedings of the 2005
SIAM International Conference on Data Mining, SIAM,
2005, pp. 556–560.
[7] M. A. Morid, M. Shajari, A. H. Golpayegani, Who are
the most influential users in a recommender system?,
in: Proceedings of the 13th international conference
on electronic commerce, 2011, pp. 1–5.
[8] A. M. Rashid, Mining influence in recommender sys-
tems, 2007.
[9] C. E. Seminario, D. C. Wilson, Assessing impacts of
a power user attack on a matrix factorization collabo-
rative recommender system, in: The Twenty-Seventh
International Flairs Conference, 2014.
[10] J. Beel, V. Brunel, Data pruning in recommender sys-
tems research: Best-practice or malpractice, ACM
RecSys (2019).
[11] F. M. Harper, J. A. Konstan, The movielens datasets:
History and context, Acm transactions on interactive
intelligent systems (tiis) 5 (2015) 1–19.
[12] I. Cantador, P. Brusilovsky, T. Kuflik, 2nd workshop on
information heterogeneity and fusion in recommender
systems (hetrec 2011), in: Proceedings of the 5th ACM
conference on Recommender systems, RecSys 2011,
ACM, New York, NY, USA, 2011.
[13] J. Ni, J. Li, J. McAuley, Justifying recommendations us-
ing distantly-labeled reviews and fine-grained aspects,
in: Proceedings of the 2019 conference on empirical
methods in natural language processing and the 9th
international joint conference on natural language
processing (EMNLP-IJCNLP), 2019, pp. 188–197.
[14] M. D. Ekstrand, Lenskit for python: Next-generation
software for recommender systems experiments, in:
Proceedings of the 29th ACM international conference
on information & knowledge management, 2020, pp.
2999–3006.