=Paper=
{{Paper
|id=Vol-3924/short1
|storemode=property
|title=The Role of Fake Users in Sequential Recommender Systems
|pdfUrl=https://ceur-ws.org/Vol-3924/short1.pdf
|volume=Vol-3924
|authors=Filippo Bettello
|dblpUrl=https://dblp.org/rec/conf/robustrecsys/Bettello24
}}
==The Role of Fake Users in Sequential Recommender Systems==
The Role of Fake Users in Sequential Recommender Systems⋆
Filippo Betello1
1
Sapienza University of Rome, Rome, Italy
Abstract
Sequential Recommender Systems (SRSs) are widely used to model user behavior over time, yet their robustness remains an under-
explored area of research. In this paper, we conduct an empirical study to assess how the presence of fake users —who engage in
random interactions, follow popular or unpopular items, or focus on a single genre —impacts the performance of SRSs in real-world
scenarios. We evaluate two SRS models across multiple datasets, using established metrics such as Normalized Discounted Cumulative
Gain (NDCG) and Rank Sensitivity List (RLS) to measure performance. While traditional metrics like NDCG remain relatively stable,
our findings reveal that the presence of fake users severely degrades RLS metrics, often reducing them to near-zero values. These results
highlight the need for further investigation into the effects of fake users on training data and emphasize the importance of developing
more resilient SRSs that can withstand different types of adversarial attacks.
Keywords
Recommender Systems, Evaluation of Recommender Systems, Model Stability, Input Data Perturbation
1. Introduction number of bots is costly.
Therefore, it is possible to create a limited number of
Recommender Systems (RSs) have become an essential part bots that can significantly influence the prominence of a
of our daily lives, helping users navigate the vast online particular item or category. By strategically deploying these
information landscape [1]. With the global expansion of bots, the visibility and perceived importance of the targeted
e-commerce services, social media platforms and streaming item or category can be enhanced, making it stand out more
services, these systems have become essential for person- compared to others. Imagine if, by using fake users, it were
alising content delivery and increasing user engagement possible to raise the profile of a certain category or product
[2]. or, conversely, to lower the profile of another. This scenario
Over the last several years, Sequential Recommender Sys- represents a form of unfair competition and is therefore
tems (SRSs) have gained significant popularity as an effec- crucial to study. Understanding how fake users behave in
tive method for modeling user behavior over time [3]. By controlled environments allows us to assess their impact
capitalizing on the temporal dependencies within users’ on real users. It is also important to investigate whether
interaction sequences, these systems can make more pre- partially coordinated fake users can actively improve the
cise predictions about user preferences [4]. This approach performance or predictions of a particular category or item.
allows for a more nuanced understanding of user behav- In this paper, we investigate the impact of fake users
ior, leading to recommendations that are better tailored to on sequential recommendation systems. Specifically, we
individual needs and preferences. As a result, SRSs have be- investigate how the inclusion of a certain percentage of
come a critical component in various applications, ranging bots affects the performance of real users. These bots are
from e-commerce [5] to music recommendation [6], where programmed to deal with random items, popular items, un-
understanding and anticipating user preferences is key to popular items and items within the same category.
enhancing user experience and engagement. Our experiments focus on the following research ques-
In recent years, the prevalence of bots (fake users) on tions:
social media platforms has increased dramatically [7]. It
is estimated that Amazon, for example, spends 2% of its • RQ1: How does the value of standard metrics such
net revenue each year fighting counterfeiting [8]. While as NDCG change for real users depending on the
several techniques have been identified to counteract this type and increasing number of fake users?
growing problem [9, 10], a detailed investigation in the area • RQ2:How do recommendation lists for real users
of sequential recommendation systems is still lacking. Li differ from those generated without fake users?
et al. [11] aims to fill this gap by investigating the impact of • RQ3: Are more or less popular items favoured by
bot-generated data on sequential recommendation models. the presence of fake users with certain types of in-
Specifically, it seeks to determine an optimal bot-generation teractions?
budget and analyze its impact on popular matrix factoriza-
tion models. Indeed, controlling and maintaining a large We evaluate our hypothesis using two different models,
SASRec [12] and GRU4Rec [13], and by employing four
RobustRecSys: Design, Evaluation, and Deployment of Robust Recom- different datasets, namely MovieLens 1M, MovieLens 100k
mender Systems Workshop @ RecSys 2024, 18 October, 2024, Bari, Italy. [14], Foursquare New York City and Foursquare Tokyo [15].
⋆
This work was partially supported by projects FAIR (PE0000013) and
SERICS (PE00000014) under the MUR National Recovery and Re-
silience Plan funded by the European Union - NextGenerationEU. 2. Related Work
Supported also by the ERC Advanced Grant 788893 AMDROMA,
EC H2020RIA project “SoBigData++” (871042), PNRR MUR project
IR0000013-SoBigData.it. This work has been supported by the project 2.1. Sequential Recommender Systems
NEREO (Neural Reasoning over Open Data) project funded by the
Sequential recommendation systems (SRSs) use algorithms
Italian Ministry of Education and Research (PRIN) Grant no. 2022AEF-
HAZ. that analyze a user’s past interactions with items to pro-
$ betello@diag.uniroma1.it (F. Betello) vide personalized recommendations over time. These sys-
0009-0006-0945-9688 (F. Betello) tems have found widespread application in areas such as
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
e-commerce [16, 5], social media [17, 18], and music stream- Suppose we have a set of 𝑛 users, represented as 𝒰 ⊂ N+ ,
ing services [19, 20, 6]. Unlike traditional recommender and a corresponding set of 𝑛 items, represented as ℐ ⊂ N+ .
systems, SRSs take into account the sequence and timing of Each user 𝑢 ∈ 𝒰 is associated with a time-ordered sequence
user interactions, resulting in more precise predictions of of interactions 𝑆𝑢 = [𝑠1 , . . . , 𝑠𝐿𝑢 ], where each 𝑠𝑖 ∈ ℐ
user preferences and behaviors [4]. denotes the 𝑖-th item with which the user has interacted.
Various methods have been developed to implement SRSs. The length of this sequence, 𝐿𝑢 , is greater than 1 and varies
Early approaches used Markov Chain models [21, 22], which, from user to user.
despite their simplicity, struggled with capturing complex A sequential recommendation system (SRS), denoted
dependencies in long-term sequences. More recently, Re- ℳ, processes the sequence up to the 𝐿-th item, denoted
current Neural Networks (RNNs) have become prominent 𝑆𝑢𝐿 = [𝑠1 , . . . , 𝑠𝐿 ], to suggest the next item, 𝑠𝐿+1 . The
in this domain [23, 13, 24]. RNNs encode a user’s historical recommendation output, 𝑟𝐿+1 = ℳ(𝑆𝑢𝐿 ) ∈ R𝑚 , is a score
preferences into a vector that is updated at each time step distribution over all possible items. This distribution is used
to predict the next item in the sequence. However, RNNs to create a ranked list of items, predicting the most likely
can encounter difficulties with long-term dependencies and interactions for user 𝑢 in the next step, 𝐿 + 1.
generating diverse recommendations.
The attention mechanism [25] has introduced another 3.2. Fake user design
promising approach. Models like SASRec [12] and
BERT4Rec [26] leverage this mechanism to dynamically Given that each item in the set ℐ has a popularity value
weight different parts of the sequence, capturing key fea- determined by user interactions, we designed four types of
tures to enhance prediction accuracy. fake user scenarios:
Additionally, Graph Neural Networks have recently • Random: Items are randomly sampled from the
gained traction in the recommendation system field, particu- entire set ℐ. Formally, each item 𝑠𝑖 in the sequence
larly within the sequential domain [27, 28]. These networks 𝑆𝑢 is selected with probability |ℐ| 1
.
excel at modeling complex relationships and dependencies, • Popularity: Items are sampled according to
further advancing the capabilities of SRSs [29, 30, 31]. a popularity-based probability distribution 𝑃pop ,
where the probability of selecting item 𝑠𝑖 is pro-
2.2. Training Perturbations portional to its popularity 𝑝𝑖 .
• Unpopularity: Similar to the popularity-based sce-
Robustness is an important aspect of SRSs as they are vul- nario, but with a distribution 𝑃unpop that inversely
nerable to noisy and incomplete data. [32, 33] investigated favors popular items. Here, the probability of select-
the effects of removing items at the beginning, middle and ing item 𝑠𝑖 is inversely proportional to its popularity,
end of a sequence of temporally ordered items and found Pr(𝑠𝑖 ) ∝ 𝑝1𝑖 , favoring less popular items.
that removing items at the end of the sequence significantly • Genre: In this scenario, items are sampled exclu-
affected all performances. sively from a specific genre. It is only applied to the
Yin et al. [34] design an attacker-chosen targeted item in ML datasets.
federated recommender systems without requiring knowl-
These fake users sequences will contain unique items to
edge about user-item rating data, user attributes, or the
ensure there are no repetitions. While the first scenario
aggregation rule used by the server. While studies are being
involves users acting independently without any sense of
conducted in other areas of recommendation [35, 36] and
cooperation, the middle two scenarios introduce a level of
several techniques have been identified to counteract this
implicit cooperation. Specifically, users in these scenarios
growing problem [9, 10], a detailed investigation in the area
tend to converge on viewing either highly popular or highly
of sequential recommendation systems is still lacking.
unpopular items, reflecting a collective behavior. The aver-
Li et al. [11] aim to address this issue by examining how
age length of the sequences will be the same as that of real
bot-generated data affects sequential recommendation mod-
users. The proportion of synthetic users will vary, compris-
els. Their research focuses on finding the optimal budget for
ing 1%, 5%, 10%, 15% and 20% of the original dataset. The
bot generation and assessing its influence on widely used
fake users are only used in the training data, leaving the
matrix factorization models. Indeed, controlling and main-
test data unaffected.
taining a large number of bots is costly. Previous research
has proposed attacks using a limited number of users and
clustering models [37], but these have not been extensively 3.3. Models
studied in the context of sequential recommendations. In our study, we use two different architectures to validate
To the best of our knowledge, our research is completely our results:
novel and breaks new ground. It explores the role that fake
users might play in influencing real users. This study aims • SASRec [12], which uses self-attention mechanisms
to shed light on the potential impact that fake users could to evaluate the importance of each interaction be-
have on the behaviour, opinions and interactions of real tween the user and the item.
users within sequential recommendation systems. • GRU4Rec [13], a RNN model that uses gated recur-
rent units (GRUs) [38] to improve prediction accu-
racy.
3. Methodology We chose these two models because they have demon-
strated exceptional performance in numerous benchmarks
3.1. Background and are widely cited in the academic literature. Moreover,
The main objective of sequential recommendation systems since one model is based on attention mechanisms and the
is to predict the user’s next interaction in a given sequence. other on RNNs, their different network operations make it
particularly interesting to evaluate their behaviour.
Table 1 rank lists. Higher values indicate that the items in
Dataset statistics after pre-processing; users and items not having the two lists are arranged similarly:
at least 5 interactions are removed. Avg. and Med. refer to the
Average and Median of Actions
User
, respectively. 𝑘
1 − 𝑝 ∑︁ 𝑑−1 |𝑋[1 : 𝑑] ∩ 𝑌 [1 : 𝑑]|
FRBO(X, Y)@k = 𝑝
Name Users Items Interactions Density Avg. Med. 1 − 𝑝𝑘 𝑑
𝑑=1
FS-NYC 1,083 9,989 179,468 1.659 165 116
FS-TKY 2,293 15,177 494,807 1.421 215 146 All metrics are computed “@𝑘”, meaning that we use just
ML-100k 943 1,349 99,287 7.805 105 64 the first 𝑘 recommended items in the output ranking, with
ML-1M 6,040 3,416 999,611 4.845 165 96
𝑘 ∈ {10, 20}.
3.4. Datasets 3.6. Experimental Setup
We use four different datasets: All experiments were performed on a single NVIDIA RTX
MovieLens [14]: Frequently utilized to evaluate recom- A6000 with 10752 CUDA cores and 48 GB of RAM. We train
mender systems, this benchmark dataset is employed in our the models for 500 epochs, fixing the batch size to 128 and
study using both the 100K and 1M versions. by using the Adam optimizer [40] with a lr of 10−3 . To run
Foursquare [15]: This dataset includes check-in data our experiments, we use the EasyRec library [41].
from New York City and Tokyo, collected over a span of
roughly ten months. 4. Results
The statistics for all the datasets are shown in Table 1. Our
pre-processing technique adheres to recognised principles, Our experiments aim to address the following research ques-
such as treating ratings as implicit, using all interactions tions:
without regard to the rating value, and deleting users and
things with fewer than 5 interactions [12, 26]. For testing, as • RQ1: How does the value of standard metrics such
in [26, 12], we keep the most recent interaction for each user, as NDCG change for real users depending on the
while for validation, we keep the second to last action. The type and increasing number of fake users?
remaining interactions are added to the training set, which • RQ2:How do recommendation lists for real users
is the only one affected by the fake users perturbation. differ from those generated without fake users?
We focus exclusively on genres in the ML dataset, as • RQ3: Are more or less popular items favoured by
it is the only dataset that contains category information. the presence of fake users with certain types of in-
Specifically, we select only those categories that represent teractions?
more than 5% of the total items in the dataset.
4.1. RQ1: Impact of Fake Users on Standard
3.5. Evaluation Metrics for Real Users
We only carry out the evaluation on the part of the real
In Figure 1 the results for all datasets considered are shown
users. To evaluate the performance of the models, we
for both models using the standard metrics.
employ traditional evaluation metrics used for Sequential
Regarding the SASRec shown in Figure 1d for the FS-NYC
Recommendation: Precision, Recall, MAP and NDCG. More-
dataset, we observe that the performance tends to improve
over, to investigate the stability of the recommendation
slightly for the unpopular scenario for the NDCG@20 met-
models, we employ the Rank List Sensitivity (RLS) [33]: it
ric, while for the popular and random interaction there is a
compares two lists of rankings 𝒳 and 𝒴, one derived from
gradual but consistent decline in performance. Regarding
the model trained under standard conditions and the other
genre interactions in the ML-1M dataset, shown in Figure 1a,
derived from the model trained with perturbed data.
all genres appear to positively impact the NDCG metric. A
Given these two rankings, and a similarity function 𝑠𝑖𝑚
more detailed analysis using RLS metrics is presented in
between them, we can formalise the RLS measure as
Section 4.2.
|𝒳 | In the case of GRU4Rec figs. 1b and 1c, there is a slow but
1 ∑︁ steady decline in performance for the ML-100k and FS-TKY
RLS = sim(𝑅𝑋𝑘 , 𝑅𝑌𝑘 ) (1)
|𝒳 | datasets, with the decline occurring in a predictable manner
𝑘=1
for both metrics considered, as the percentage of fake users
where 𝑋𝑘 and 𝑌𝑘 represent the 𝑘-th ranking inside 𝒳
increase.
and 𝒴 respectively.
RLS’s similarity measure can be chosen from two possible
options: 4.2. RQ2: Analysis of Recommendation Lists
Generated for Real Users
• Jaccard Similarity (JAC) [39] is a normalized
measure of the similarity of the contents of two sets. In Figure 2 we present the RLS metrics for all datasets consid-
A model is stable if its Jaccard score is close to 1. ered, comparing the performance of the two models. These
metrics are derived from predictions made by the standard
|𝑋 ∩ 𝑌 | model - without fake users - and predictions made after
JAC(X, Y) = (2)
|𝑋 ∪ 𝑌 | training with fake users.
When analysing the SASRec model on the ML-100k
• Finite-Rank-Biased Overlap (FRBO) [32]
dataset (fig. 2a), SASRec shows minimal performance degra-
measures the similarity of orderings between two
dation. Conversely, the FS-TKY dataset gives less favourable
results, with significantly worse performance and a Jaccard
Recall_@10 vs. Percentage for Different Sampling Methods
0.780
remains low at around 0.35 (fig. 2c). For ML-100k and genre
interactions, the degradation in performance is consistent
0.775
across all genres, with the degradation worsening as the
number of fake users increases.
0.770
The evaluation metrics for Foursquare show a significant
Recall_@10
0.765
drop in performance compared to other datasets, highlight-
ing the limitations of the dataset [42].
Action
0.760 Adventure
Comedy
An additional observation is that as the number of fake
Drama
Romance users increases, the performance of the model generally de-
Thriller
0.755 Base performances teriorates. This suggests that while adding more fake users
0.01 0.05 0.10 0.15 0.20
% of fake users tends to reduce the effectiveness of the lists generated, man-
(a) NDCG@20 ML-1M SASRec aging a higher number of fake users becomes increasingly
NDCG_@20 vs. Percentage for Different Sampling Methods difficult.
0.74
0.73 4.3. RQ3: Influence of Fake User
Interactions on Popular and Unpopular
0.72
Items
NDCG_@20
0.71
We investigated whether popular and unpopular items were
favoured in recommendation lists by analysing the percent-
0.70
Random age of the top 20 items recommended to each user. Our
Unpopular
0.69
Popular
Base performances
results show that unpopular items were consistently under-
0.01 0.05 0.10
% of fake users
0.15 0.20 represented in these lists. This suggests that more users, a
wider range of items, or consideration of a larger number of
(b) MAP@10 FS-TKY GRU4Rec top positions (e.g. top 100 items) may be necessary to gain
NDCG_@20 vs. Percentage for Different Sampling Methods
a better understanding. On the other hand, in the ML-100k
0.430
dataset, the percentage of popular items in the recommen-
0.425 dation lists without any user-specific adjustments is 5.73%.
0.420 The introduction of popular users barely affects this per-
0.415 centage (5.68%), while the inclusion of non-popular users
NDCG_@20
0.410
slightly reduces it to 5.45%.
These results suggest significant opportunities for future
0.405
research, such as focusing on specific categories of items to
0.400
Random
Unpopular
either improve or reduce recommendation performance.
0.395 Popular
Base performances
0.01 0.05 0.10 0.15 0.20
% of fake users
5. Conclusion
(c) NDCG@20 ML-100k GRU4Rec
MAP_@10 vs. Percentage for Different Sampling Methods In this work we investigated the impact of fake users on real
0.186
users. These fake users can have random interactions, inter-
0.184 act with popular or unpopular items, and are only added to
the training set at different percentages of the total dataset.
0.182 The results showed that although the standard metrics were
not significantly affected, with random perturbations caus-
MAP_@10
0.180
ing the most significant degradation in performance, the
0.178 output lists generated under these perturbations were sig-
Random nificantly different from the standard lists trained without
Unpopular
0.176
Popular
Base performances
any perturbations. These differences, measured using rank-
0.01 0.05 0.10 0.15 0.20 ing list sensitivity metrics, in particular Jaccard and FRBO,
% of fake users
showed that in the case of MovieLens about half of the list
(d) NDCG@20 FS-NYC SASRec elements were shared, whereas in the case of Foursquare
Figure 1: Plots of various metrics for all the datasets considered almost no elements were considered. Furthermore, the pro-
as the percentage of fake users increases. The baseline is shown portion of popular and unpopular items in recommendations
as a horizontal solid line, while other lines show the metrics for real users was not affected by the presence of fake users.
as the percentage of fake users changes for the three scenarios This study opens up future research directions in a num-
considered. ber of ways. First, it would be valuable to compare the
number of recommended items - categorised as popular,
unpopular and genre-specific - using a standard training
index close to 0, indicating that the generated lists have model with those generated by a model trained on fake
almost no overlap with the original lists (fig. 2b). users. This comparison could reveal better significant dif-
Figures Figures 2c and 2d show the performance on the ferences in recommendation patterns. Second, the creation
ML-100k dataset for genre sampling and the ML-1M dataset of a set of fake users could allow to systematically elevate
for the other sampling methods. On the ML-1M dataset, the or downgrade certain categories over time. Third, studying
performance is relatively good, although the Jaccard index datasets with shorter interaction sequences, such as those
from Amazon [43], could provide new insights into user
Comparison of RLS_FRBO @20 for different sampling methods
Popular different optimisation objectives [47].
Unpopular
0.800 Random
0.775
References
0.750
[1] G. Adomavicius, A. Tuzhilin, Toward the next genera-
FRBO@20
0.725
0.700 tion of recommender systems: A survey of the state-
of-the-art and possible extensions, IEEE transactions
0.675
on knowledge and data engineering 17 (2005) 734–749.
0.650
[2] S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based
0.625 recommender system: A survey and new perspectives,
0.025 0.050 0.075 0.100
% of fake users
0.125 0.150 0.175 0.200
ACM Comput. Surv. 52 (2019). URL: https://doi.org/10.
1145/3285029. doi:10.1145/3285029.
(a) RLS-FRBO ML100k SASRec
[3] M. Quadrana, P. Cremonesi, D. Jannach, Sequence-
Comparison of RLS_FRBO @20 for different sampling methods
Popular aware recommender systems, ACM Computing Sur-
0.013 Unpopular
Random veys (CSUR) 51 (2018) 1–36.
0.012 [4] S. Wang, L. Hu, Y. Wang, L. Cao, Q. Z. Sheng,
M. Orgun, Sequential recommender systems: chal-
0.011
lenges, progress and prospects, arXiv preprint
FRBO@20
0.010 arXiv:2001.04830 (2019).
0.009
[5] H. Hwangbo, Y. S. Kim, K. J. Cha, Recommendation
system development for fashion retail e-commerce,
0.008 Electronic Commerce Research and Applications 28
0.007 (2018) 94–101.
0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 [6] D. Afchar, A. Melchiorre, M. Schedl, R. Hennequin,
% of fake users
E. Epure, M. Moussallam, Explainability in music
(b) RLS-FRBO FS-TKY SASRec recommender systems, AI Magazine 43 (2022) 190–
Comparison of RLS_JAC @20 for different sampling methods 208.
Popular
0.365
Unpopular [7] E. Ferrara, O. Varol, C. Davis, F. Menczer, A. Flammini,
Random
The rise of social bots, Communications of the ACM
0.360
59 (2016) 96–104.
0.355
[8] M. Daniels, Amazon says its stopped 700k coun-
terfeiters from making accounts last year, 2024.
JAC@20
0.350 URL: https://www.modernretail.co/technology/
amazon-says-its-stopped-700k-counterfeiters-from-making-accounts-last
0.345 [9] M. Mendoza, M. Tesconi, S. Cresci, Bots in social and
interaction networks: detection and impact estimation,
0.340
ACM Transactions on Information Systems (TOIS) 39
0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200
% of fake users (2020) 1–32.
[10] M. Mazza, S. Cresci, M. Avvenuti, W. Quattrociocchi,
(c) RLS-JAC ML-1M GRU4Rec
Comparison of RLS_FRBO @20 for different sampling methods
M. Tesconi, Rtbust: Exploiting temporal patterns for
0.52 Action botnet detection on twitter, in: Proceedings of the 10th
Adventure
Comedy ACM conference on web science, 2019, pp. 183–192.
Drama
0.50 Romance
Thriller
[11] H. Li, S. Di, L. Chen, Revisiting injective attacks on rec-
0.48 ommender systems, Advances in Neural Information
Processing Systems 35 (2022) 29989–30002.
FRBO@20
0.46
[12] W.-C. Kang, J. McAuley, Self-attentive sequential rec-
0.44 ommendation, in: 2018 IEEE international conference
on data mining (ICDM), IEEE, 2018, pp. 197–206.
0.42
[13] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk,
0.40 Session-based recommendations with recurrent neural
0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 networks, 2016. arXiv:1511.06939.
% of fake users
[14] F. M. Harper, J. A. Konstan, The movielens datasets:
(d) RLS-FRBO ML-100k GRU4Rec History and context, ACM Trans. Interact. Intell. Syst.
5 (2015). URL: https://doi.org/10.1145/2827872. doi:10.
Figure 2: Plots of RLS metrics for all the datasets considered as
1145/2827872.
the percentage of fake users increases. The metrics are shown
as the percentage of fake users changes for the three scenarios
[15] D. Yang, D. Zhang, V. W. Zheng, Z. Yu, Modeling user
considered. activity preference by leveraging user spatial temporal
characteristics in lbsns, IEEE Transactions on Systems,
Man, and Cybernetics: Systems 45 (2014) 129–142.
[16] J. B. Schafer, J. A. Konstan, J. Riedl, E-commerce recom-
behaviour and recommendation effectiveness. Finally, re-
mendation applications, Data Mining and Knowledge
search should focus on building resilient models for these
Discovery 5 (2001) 115–153.
types of perturbations: the solution could lie in different
[17] I. Guy, N. Zwerdling, I. Ronen, D. Carmel, E. Uziel, So-
training strategies[44], robust loss functions [45, 46], or
cial media recommendation based on people and tags,
in: Proceedings of the 33rd International ACM SIGIR 3459637.3482242. doi:10.1145/3459637.3482242.
Conference on Research and Development in Infor- [29] S. Wu, F. Sun, W. Zhang, X. Xie, B. Cui, Graph neural
mation Retrieval, SIGIR ’10, Association for Comput- networks in recommender systems: a survey, ACM
ing Machinery, New York, NY, USA, 2010, p. 194–201. Computing Surveys 55 (2022) 1–37.
URL: https://doi.org/10.1145/1835449.1835484. doi:10. [30] A. Purificato, G. Cassarà, P. Liò, F. Silvestri, Sheaf neu-
1145/1835449.1835484. ral networks for graph-based recommender systems,
[18] F. Amato, V. Moscato, A. Picariello, G. Sperlí, Recom- arXiv preprint arXiv:2304.09097 (2023).
mendation in social media networks, in: 2017 IEEE [31] A. Purificato, F. Silvestri, Eco-aware graph neural
Third International Conference on Multimedia Big networks for sustainable recommendations, arXiv
Data (BigMM), IEEE, 2017, pp. 213–216. preprint arXiv:2410.09514 (2024).
[19] M. Schedl, P. Knees, B. McFee, D. Bogdanov, M. Kamin- [32] F. Betello, F. Siciliano, P. Mishra, F. Silvestri, Inves-
skas, Music recommender systems, Recommender tigating the robustness of sequential recommender
Systems Handbook (2015) 453–492. systems against training data perturbations, in: Euro-
[20] M. Schedl, H. Zamani, C.-W. Chen, Y. Deldjoo, M. Elahi, pean Conference on Information Retrieval, Springer,
Current challenges and visions in music recommender 2024, pp. 205–220.
systems research, International Journal of Multimedia [33] S. Oh, B. Ustun, J. McAuley, S. Kumar, Rank list sen-
Information Retrieval 7 (2018) 95–116. sitivity of recommender systems to interaction per-
[21] F. Fouss, A. Pirotte, M. Saerens, A novel way of comput- turbations, in: Proceedings of the 31st ACM Inter-
ing similarities between nodes of a graph, with appli- national Conference on Information & Knowledge
cation to collaborative recommendation, in: The 2005 Management, CIKM ’22, Association for Computing
IEEE/WIC/ACM International Conference on Web In- Machinery, New York, NY, USA, 2022, p. 1584–1594.
telligence (WI’05), IEEE, 2005, pp. 550–556. URL: https://doi.org/10.1145/3511808.3557425. doi:10.
[22] F. Fouss, S. Faulkner, M. Kolp, A. Pirotte, M. Saerens, 1145/3511808.3557425.
et al., Web recommendation system based on a markov- [34] M. Yin, Y. Xu, M. Fang, N. Z. Gong, Poisoning federated
chainmodel., in: ICEIS (4), 2005, pp. 56–63. recommender systems with fake users, in: Proceedings
[23] T. Donkers, B. Loepp, J. Ziegler, Sequential user-based of the ACM on Web Conference 2024, 2024, pp. 3555–
recurrent neural network recommendations, in: Pro- 3565.
ceedings of the Eleventh ACM Conference on Recom- [35] G. Trappolini, V. Maiorca, S. Severino, E. Rodolà, F. Sil-
mender Systems, RecSys ’17, Association for Comput- vestri, G. Tolomei, Sparse vicious attacks on graph
ing Machinery, New York, NY, USA, 2017, p. 152–160. neural networks, IEEE Transactions on Artificial Intel-
URL: https://doi.org/10.1145/3109859.3109877. doi:10. ligence 5 (2024) 2293–2303. doi:10.1109/TAI.2023.
1145/3109859.3109877. 3319306.
[24] M. Quadrana, A. Karatzoglou, B. Hidasi, P. Cremonesi, [36] Z. Chen, F. Silvestri, J. Wang, Y. Zhang, G. Tolomei,
Personalizing session-based recommendations with The dark side of explanations: Poisoning recom-
hierarchical recurrent neural networks, in: Proceed- mender systems with counterfactual examples, in:
ings of the Eleventh ACM Conference on Recom- Proceedings of the 46th International ACM SIGIR Con-
mender Systems, RecSys ’17, Association for Comput- ference on Research and Development in Informa-
ing Machinery, New York, NY, USA, 2017, p. 130–137. tion Retrieval, SIGIR ’23, Association for Computing
URL: https://doi.org/10.1145/3109859.3109896. doi:10. Machinery, New York, NY, USA, 2023, p. 2426–2430.
1145/3109859.3109896. URL: https://doi.org/10.1145/3539618.3592070. doi:10.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, 1145/3539618.3592070.
L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Atten- [37] Y. Wang, Y. Liu, Q. Wang, C. Wang, Clusterpoison: Poi-
tion is all you need, Advances in neural information soning attacks on recommender systems with limited
processing systems 30 (2017). fake users, IEEE Communications Magazine (2024).
[26] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, P. Jiang, [38] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau,
Bert4rec: Sequential recommendation with bidirec- F. Bougares, H. Schwenk, Y. Bengio, Learning phrase
tional encoder representations from transformer, in: representations using rnn encoder-decoder for statis-
Proceedings of the 28th ACM international conference tical machine translation, 2014. arXiv:1406.1078.
on information and knowledge management, 2019, pp. [39] P. Jaccard, The distribution of the flora in the alpine
1441–1450. zone. 1, New Phytologist 11 (1912) 37–50.
[27] J. Chang, C. Gao, Y. Zheng, Y. Hui, Y. Niu, Y. Song, [40] D. P. Kingma, J. Ba, Adam: A method for stochastic
D. Jin, Y. Li, Sequential recommendation with graph optimization, 2017. arXiv:1412.6980.
neural networks, in: Proceedings of the 44th Interna- [41] F. Betello, A. Purificato, F. Siciliano, G. Trappolini,
tional ACM SIGIR Conference on Research and Devel- A. Bacciu, N. Tonellotto, F. Silvestri, A reproducible
opment in Information Retrieval, SIGIR ’21, Associa- analysis of sequential recommender systems, IEEE
tion for Computing Machinery, New York, NY, USA, Access (2024).
2021, p. 378–387. URL: https://doi.org/10.1145/3404835. [42] A. Klenitskiy, A. Volodkevich, A. Pembek, A. Vasilev,
3462968. doi:10.1145/3404835.3462968. Does it look sequential? an analysis of datasets for
[28] Z. Fan, Z. Liu, J. Zhang, Y. Xiong, L. Zheng, P. S. evaluation of sequential recommendations, arXiv
Yu, Continuous-time sequential recommendation with preprint arXiv:2408.12008 (2024).
temporal graph collaborative transformer, in: Pro- [43] Y. Hou, J. Li, Z. He, A. Yan, X. Chen, J. McAuley, Bridg-
ceedings of the 30th ACM International Conference ing language and items for retrieval and recommenda-
on Information & Knowledge Management, CIKM ’21, tion, arXiv preprint arXiv:2403.03952 (2024).
Association for Computing Machinery, New York, NY, [44] A. Petrov, C. Macdonald, Effective and efficient train-
USA, 2021, p. 433–442. URL: https://doi.org/10.1145/ ing for sequential recommendation using recency sam-
pling, in: Proceedings of the 16th ACM Conference
on Recommender Systems, 2022, pp. 81–91.
[45] M. S. Bucarelli, L. Cassano, F. Siciliano, A. Mantrach,
F. Silvestri, Leveraging inter-rater agreement for classi-
fication in the presence of noisy labels, in: Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2023, pp. 3439–3448.
[46] F. A. Wani, M. S. Bucarelli, F. Silvestri, Learning with
noisy labels through learnable weighting and centroid
similarity, in: 2024 International Joint Conference on
Neural Networks (IJCNN), IEEE, 2024, pp. 1–9.
[47] A. Bacciu, F. Siciliano, N. Tonellotto, F. Silvestri, Inte-
grating item relevance in training loss for sequential
recommender systems, in: Proceedings of the 17th
ACM Conference on Recommender Systems, 2023, pp.
1114–1119.