A Regression Framework to Interpret the Robustness
of Recommender Systems Against Shilling Attacks
Discussion Paper

Yashar Deldjoo1 , Tommaso Di Noia1 , Eugenio Di Sciascio1 and Felice Antonio Merra1,2
1
    Politecnico di Bari, via Orabona, 4, 70125 Bari, Italy
2
    The authors are in alphabetical order. Corresponding author: Felice Antonio Merra ( f e l i c e . m e r r a @ p o l i b a . i t ).


                                         Abstract
                                         Collaborative filtering recommender systems (CF-RSs) employ user-item feedback, e.g., ratings, purchases,
                                         or reviews, to harmonize similarities among customers and produce personalized lists of products. Being
                                         based on the benevolence of other customers, CF-RSs are vulnerable to Shilling Attacks, i.e., fake profiles
                                         injected on the platform by adversaries to hack the recommendation outcomes toward a corrupt behavior.
                                         While mainly works on shilling attacks have been conducted to propose novel methods, compare
                                         recommendation models and outputs with and without defenses, we have found a lack of study on the
                                         impact of dataset properties on the CF-RSs robustness. In this work, we present a regression model
                                         to test whether dataset characteristics can impact the robustness of CF-RSs under shilling attacks to
                                         interpret their efficacy depending on these characteristics. Obtained results can help the system designer
                                         understand the cause of CF-RSs performance variations in attack scenarios.

                                         Keywords
                                         Recommender systems, Shilling Attacks, Robustness


1. Introduction and Motivation
Collaborative filtering recommender systems (CF-RSs) are a core service in online platforms in
increasing traffic and promoting sales [1, 2]. A key assumption of collaborative models is that
users with similar preferences will likely agree to interact with novel (next) items. However,
CF-RSs are vulnerable to adversarial attacks [3] such as the injection of fake profiles, named
Shilling Profiles [4, 5], perturbed side-data [6, 7], or perturbed parameters [8]. The motivation
for such attacks is often malicious, e.g., economic gain, market infiltration, and even for causing
damage on an underlying system (break the model availability). For instance, fake social media
accounts might be created to spread fake news, or false reviews could be provided about a
product to promote (push) or demote (nuke) items. For instance, past works have shown that a
few fake profiles (e.g., 3%) are sufficient for a prediction shift up to 30% [9, 10].
   Three main directions have been explored on shilling attacks: attack designs, detection
algorithms, and defense strategies. The main shilling attack strategies are random, average,
popular, bandwagon, and love-hate [11]. These assume a certain level of knowledge of the
adversary on the recommendation model, recommendation outputs, the properties of items

IIR 2021 – 11th Italian Information Retrieval Workshop, September 13–15, 2021, Bari, Italy
Envelope-Open yashar.deldjoo@poliba.it (Y. Deldjoo); tommaso.dinoia@poliba.it (T. Di Noia); eugenio.disciascio@poliba.it
(E. Di Sciascio); felice.merra@poliba.it (F. A. Merra)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
(e.g., rating mean and entropy [12])) and users (e.g., group of users [13]). Detection strategies
aim to filter out fake profiles before used for the model learning [14, 15]. Robust algorithms try
to reduce the influence of possible out-of-distribution profiles [16, 10].
    While previous works have been orientated to “win-lose” scenarios, i.e., find an answer to
questions such as “Which attack models impact more the performance of specific recommendation
models? ”, “Which amount of knowledge on a specific recommendation-model is required for specific
attack A to influence recommendation algorithm B? ”. No effort has been made to provide an
interpretation on which dataset features can impact the effectiveness of attacks. Indeed, while
it is well-known that CF-RSs are affected by the sparsity of the dataset (e.g., a denser dataset can
make easier the recommendation task [17]), there are no claims in the case of shilling attacks.
    In this works, we focus on a novel research question “Given popular shilling attack types and
CF models already recognized by the community, which dataset characteristics can explain an
observed change in the performance of recommendation?” To answer this question, we propose a
regression-based model to analyze the effects of dataset characteristics on the robustness of
CF-RSs, and, via a large-scale experiment on three domains, we evaluate how three classes of
data characteristics —rating structure, rating value, and rating distribution— may influence the
robustness of CF-RSs. This work is an extended abstract of [18] published at SIGIR 2020.


2. Model
Let 𝑈 and 𝐼 denote a set of users and items in a system, and 𝑅 ∈ ℝ|𝑈 |×|𝐼 | as the complete user-item
rating matrix, where each entry 𝑟𝑢𝑖 ∈ ℝ represents a rating assigned by user 𝑢 ∈ 𝑈 to item
𝑖 ∈ 𝐼 if it is a recorded interaction (we use 𝐾 to indicate the set of recorded interactions), a
shilling attack consists in adding novel users as composed by 𝐼𝑆 the selected item set, 𝐼𝐹 the
filler set, 𝐼𝜙 the unrated-item set, and 𝐼𝑇 the target item set. 𝐼𝑆 contains items identified by the
attacker to exploit the owned knowledge to maximize the effectiveness of the attack, 𝐼𝐹 holds
randomly selected items for which rating scores are assigned to make the attack imperceptible.
𝐼𝜙 includes items without ratings in the fake user profile, and 𝐼𝑇 is the item is to push or nuke.
The 𝑆𝑃 composition varies based on attack strategies. We study: Random [12], Love-Hate [19],
Bandwagon [20], Popular [21], Average [12], and Perfect Knowledge [22]. To study the
impact of characteristics on the efficacy of this class of attacks, we use an explanatory model
defined as follows:
Definition 1 (Framework). Let 𝐷 bet the set of datasets, let 𝐶 be the number of data characteristic
factors, let X𝑐 be the matrix containing the independent variables values (data characteristic values
specified below), then the regression model is built using the formulation

                                             y = 𝜖 + 𝜃 0 + 𝜃 𝑐 X𝑐                                          (1)

where 𝜃0 represents the expected value of y (the attack performance metric under analysis),
𝜃𝑐 = [𝜃1 , 𝜃2 , ..., 𝜃𝐶 ] is the vector of the regression coefficient associated with the IVs, and 𝜖 the error.
Independent Variables (IV) We explore three class of independent variables on the (i) structure
(i.e., 𝑆𝑝𝑎𝑐𝑒𝑆𝑖𝑧𝑒𝑙𝑜𝑔 ,𝑆ℎ𝑎𝑝𝑒𝑙𝑜𝑔 , and 𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑙𝑜𝑔 ), (ii) rating frequency (i.e., 𝐺𝑖𝑛𝑖𝑖𝑡𝑒𝑚 and 𝐺𝑖𝑛𝑖𝑢𝑠𝑒𝑟 ), and
(iii) rating values (𝑆𝑡𝑑𝑟𝑎𝑡𝑖𝑛𝑔 ) of the user-item rating matrix. F 𝑆𝑝𝑎𝑐𝑒𝑆𝑖𝑧𝑒𝑙𝑜𝑔 big values might
imply a higher chance of finding similar neighbor users or items. Therefore, as both attack and
recommendation models rely on identifying like-minded users (neighbor users) or similarly rated
items (neighbor items), we deem 𝑆𝑝𝑎𝑐𝑒𝑆𝑖𝑧𝑒𝑙𝑜𝑔 to be an impactful characteristic on evaluating the
performance of shilling attacks. 𝑆ℎ𝑎𝑝𝑒𝑙𝑜𝑔 can impact the effectiveness of shilling profile injection
attacks. For example, in domains where |𝑈 | « |𝐼 |) there are more candidate neighbor users
than candidate neighbor items for memory-based CF models. This situation might work to the
advantage of user-based CF than item-based CF. Moreover, under a similar number of ratings,
changing the shape implies changing the average number of ratings per item |𝐾 | ÷ |𝐼 |. We
conjecture that this circumstance may impact the robustness of CF, since the construction of 𝑆𝑃
is mainly based on altering the popularity of targeted items. 𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑙𝑜𝑔 is a well-recognized issue
in the community of RS and 𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑙𝑜𝑔 = 1 − 𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦. Sparser data means that the fraction of
unrated items significantly exceeds the fraction of rated ones [23]. It can harm the performance
of CF, reducing, for instance, the chance of discovering neighbors in memory-based CF, building
accurate model-based CF [24]. In [25], we have already identified a potential impact of dataset
density on the effectiveness of shilling attacks. 𝐺𝑖𝑛𝑖𝑖𝑡𝑒𝑚 and 𝐺𝑖𝑛𝑖𝑢𝑠𝑒𝑟 measure the concentration
of items, or users’, ratings and use them to capture the rating frequency distribution. The equal
popularity (e.g., all users give the same number of ratings) is represented with the value of
the Gini coefficients to 0, while the total inequality (e.g., only one user has given all ratings) is
represented with the value to 1. Finally, we study 𝑆𝑡𝑑𝑟𝑎𝑡𝑖𝑛𝑔 motivated by the connection between
high rating variance and recommendation performance claimed in Herlocker et al. [26] and the
linear and negative impact on the accuracy shown in [17].
Dependent Variables The dependent variable (DV) used to study the effectiveness of the attack
on RS is the Incremental Overall Hit Ratio (Δ𝐻 𝑅@𝑘 ). This is a stability metric that measures if
the recommendation model recommends different products due to the attack irrespective of
their actual rating value [22]. The 𝐻 𝑅 metric has been proposed by Aggarwal [27]


3. Experiments
We study three datasets: ML-20M [28] having movies ratings (𝑈 = 138, 493, 𝐼 = 26, 744, 𝐾 =
20, 000, 263, 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 = 0.0054), Yelp [29] containing users’ reviews on businesses (𝑈 = 25, 677,
𝐼 = 25, 778, 𝐾 = 705, 994, 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 = 0.0010), and LFM-1b [30] presenting user-artist play
counts(𝑈 = 120, 175, 𝐼 = 521, 232, 𝐾 = 25, 285, 767, 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 = 0.0004). We use three CF-RSs
available in [31]: User-kNN [32], Item-kNN [32], and SVD [33]. Additional reproducibility
details are available in the original work [18]. Table 1 presents a snapshot of the full results for
answering two research questions presented below.
RQ1. Is there an underlying relationship between the studied IVs and the DV ? The re-
sults obtained for the adjusted coefficient of determination (𝑎𝑑𝑗.𝑅2 ) show that the six dataset
characteristics can explain more than 60% of the variation in Δ𝐻 𝑅@𝑘 irrespective of the attack
type, model, and dataset, providing empirical evidence supporting the hypothesis that the six
IVs can explain a large part of the DV. The explanatory power is highest for the model-based
SVD approach (when comparing the global behavior of each CF model). However, not a similar
observation could be made on attacks.
RQ2. How do IVs impact the DV in terms of the significance and directionality? The
Table 1
Regression results for the within dataset analysis (attack size 1%). Full table results in [18].
                                            User-kNN                                     Item-kNN                                          SVD
         Δ𝐻 𝑅@10              ML-20M             Yelp        LFM-1b        ML-20M             Yelp        LFM-1b        ML-20M             Yelp        LFM-1b
              𝑅2 (𝑎𝑑𝑗.𝑅2 )   0.761(0.758)   0.838(0.835)   0.673(0.668)   0.820(0.818)   0.815(0.812)   0.666(0.662)   0.843(0.841)   0.908(0.907)   0.790(0.788)
              𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡          .179***        .609***        .717***        .262***        .610***        .715***        .482***        .524***        .688***
              𝑆𝑝𝑎𝑐𝑒𝑆𝑖𝑧𝑒𝑙𝑜𝑔    -0.063***          .041       -0.629***          .008           .003       -0.520***          .040*        .368***      -0.368***
 Random 𝑆ℎ𝑎𝑝𝑒𝑙𝑜𝑔                .184***        .248***          .288*        .139***        .198***          .125         .207***        .275***           .192
              𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑙𝑜𝑔      -0.189***        -0.316*      -1.546***      -0.174***       -0.376**      -1.366***      -0.274***        .393***      -1.047***
              𝐺𝑖𝑛𝑖𝑢𝑠𝑒𝑟𝑠           .277          -0.012       1.901***         -0.223          .030           .891            .178       -0.660**          .988*
              𝐺𝑖𝑛𝑖𝑖𝑡𝑒𝑚           -0.102         -0.485       1.753***         -0.305         -0.241       1.784***           .102      -1.270***       1.355***
              𝑆𝑡𝑑𝑟𝑎𝑡𝑖𝑛𝑔          -0.072          .287          -0.152         -0.120          .326           .012          -0.240         .311*          -0.108
              𝑅2 (𝑎𝑑𝑗.𝑅2 )   0.759(0.756)   0.831(0.829)   0.673(0.668)   0.819(0.816)   0.813(0.811)   0.666(0.661)   0.845(0.843)   0.910(0.909)   0.790(0.788)
              𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡          .187***        .609***        .717***        .276***        .608***        .715***        .502***        .523***        .690***
              𝑆𝑝𝑎𝑐𝑒𝑆𝑖𝑧𝑒𝑙𝑜𝑔    -0.063***          .048       -0.632***          .018           .010       -0.513***         .046**        .373***      -0.339***
 Average      𝑆ℎ𝑎𝑝𝑒   𝑙𝑜𝑔       .182***        .260***          .291*        .136***        .201***          .114         .189***        .273***           .167
              𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑙𝑜𝑔      -0.189***        -0.290*      -1.553***      -0.162***       -0.359**      -1.352***      -0.271***        .405***      -0.991***
              𝐺𝑖𝑛𝑖𝑢𝑠𝑒𝑟𝑠           .296           .074        1.907***         -0.265          .028           .857            .095       -0.652**          .833*
              𝐺𝑖𝑛𝑖𝑖𝑡𝑒𝑚           -0.072         -0.522       1.755***         -0.284         -0.243       1.796***           .258      -1.267***       1.317***
              𝑆𝑡𝑑𝑟𝑎𝑡𝑖𝑛𝑔          -0.065          .299          -0.150         -0.114          .312           .019          -0.242         .322*          -0.079
 ***𝑝 ≤ .001, **𝑝 ≤ .01, *𝑝 ≤ .05


significance of the computed regression coefficients for the IVs tends to vary for each IV or group
of IVs. The results show that the regression coefficients computed for 𝑆𝑝𝑎𝑐𝑒𝑆𝑖𝑧𝑒𝑙𝑜𝑔 , 𝑆ℎ𝑎𝑝𝑒𝑙𝑜𝑔 ,
and 𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑙𝑜𝑔 are statistically significant. This shows enough statistical evidence to support
the hypothesis that structural characteristics can explain DV variations(𝑝 < 0.05, 0.01, 0.001).
However, results for the other IVs vary depending on <attack, CF-model, dataset> triplet, or
they can be insignificant as in the case of 𝑆𝑡𝑑𝑟𝑎𝑡𝑖𝑛𝑔 . For instance, the coefficients for Gini indices
(i.e., 𝐺𝑖𝑛𝑖𝑢𝑠𝑒𝑟 and 𝐺𝑖𝑛𝑖𝑖𝑡𝑒𝑚 ) are most significant for shilling attacks against SVD, particularly for
samples drawn from the Yelp and LFM-1b datasets. The coefficients for 𝑆𝑡𝑑𝑟𝑎𝑡𝑖𝑛𝑔 are insignificant
(p-value > 0.05) in all experimented cases, except for two cases <Random/Average attack,
SVD, Yelp>, implying that this dataset characteristic, which deals directly with rating values,
plays a less significant role on the impact of the attack. Investigating the directionality of the
coefficients, Table 1 shows that the 𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝑙𝑜𝑔 has a negative effect on Δ𝐻 𝑅@𝑘 across majority
of the cases in <attacks, CF-model, dataset>. This result is consistent with RSs findings that
increasing the density is suitable for the performance of CF-RSs [34, 17]. An explanation is
that: if we fix the number of users and items and increase the number of genuine ratings, the
accuracy of similarities is improved by using more genuine ratings. As these similarities are
generally vulnerable to the insertion of fake profiles, adding more genuine feedbacks can help
to decrease the impact of attacks. Additionally, we can note that 𝑆𝑝𝑎𝑐𝑒𝑆𝑖𝑧𝑒𝑙𝑜𝑔 has a negative
impact on Δ𝐻 𝑅@𝑘 in neighborhood models, which means that increasing the space size makes
neighborhood models less vulnerable to attacks. Finally, and on the contrary, 𝑆ℎ𝑎𝑝𝑒𝑙𝑜𝑔 presents
a consistently positive influence on the efficacy of the attacks. We explain it by considering the
following example: increasing 𝑆ℎ𝑎𝑝𝑒𝑙𝑜𝑔 leads to an increased number of users with respect to
items (i.e., decreasing items). In this way, it could be easier to push the target item to higher
positions inside the recommendation list (i.e., fewer items have contributed).

4. Conclusion and Future Work
We have provided statistical evidence to accept the hypothesis that the chosen properties
account for a considerable portion of variations in attack performance. In particular, structural
properties (i.e., size, shape, and density) have a significant impact on the model, distributional
(i.e., Gini index) have a higher impact on memory-based models, and standard deviation does
not show an effect. Novel characteristics, attacks, and models are possible extensions.


Acknowledgments
We acknowledge support of PON ARS01_00876 BIO-D, Casa delle Tecnologie Emergenti della Città di
Matera, PON ARS01_00821 FLET4.0, PIA Servizi Locali 2.0, H2020 Passapartout - Grant n. 101016956,
and PIA ERP4.0.


References
 [1] C. A. Gomez-Uribe, N. Hunt, The netflix recommender system: Algorithms, business
     value, and innovation, ACM Trans. Management Inf. Syst. 6 (2016) 13:1–13:19. URL:
     https://doi.org/10.1145/2843948. doi:1 0 . 1 1 4 5 / 2 8 4 3 9 4 8 .
 [2] B. Smith, G. Linden, Two decades of recommender systems at amazon.com, IEEE Internet
     Computing 21 (2017) 12–18. URL: https://doi.org/10.1109/MIC.2017.72. doi:1 0 . 1 1 0 9 / M I C .
     2017.72.
 [3] Y. Deldjoo, T. D. Noia, F. A. Merra, A survey on adversarial recommender systems: From
     attack/defense strategies to generative adversarial networks, ACM Comput. Surv. 54 (2021)
     35:1–35:38.
 [4] I. Gunes, C. Kaleli, A. Bilge, H. Polat, Shilling attacks against recommender systems: a
     comprehensive survey, Artif. Intell. Rev. 42 (2014) 767–799.
 [5] V. W. Anelli, Y. Deldjoo, T. Di Noia, E. Di Sciascio, F. A. Merra, Sasha: Semantic-aware
     shilling attacks on recommender systems exploiting knowledge graphs, in: 17th European
     Semantic Web Conference ESWC 2020, Springer, 2020.
 [6] V. W. Anelli, T. D. Noia, D. Malitesta, F. A. Merra, Assessing perceptual and recommendation
     mutation of adversarially-poisoned visual recommenders (short paper), in: DP@AI*IA,
     volume 2776 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 49–56.
 [7] T. D. Noia, D. Malitesta, F. A. Merra, Taamr: Targeted adversarial attack against multimedia
     recommender systems, in: DSN Workshops, IEEE, 2020, pp. 1–8.
 [8] V. W. Anelli, T. D. Noia, F. A. Merra, The idiosyncratic effects of adversarial train-
     ing on bias in personalized recommendation learning, in: RecSys 2021: Fifteenth
     ACM Conference on Recommender Systems (RecSys ’21), September 27-October 1,
     2021, Amsterdam, Netherlands, ACM, 2021. URL: https://doi.org/10.1145/3460231.3478858.
     doi:1 0 . 1 1 4 5 / 3 4 6 0 2 3 1 . 3 4 7 8 8 5 8 .
 [9] D. Jannach, M. Zanker, A. Felfernig, G. Friedrich, Recommender Systems - An Introduction,
     Cambridge University Press, 2010. URL: http://www.cambridge.org/de/academic/
     subjects/computer-science/knowledge-management-databases-and-data-mining/
     recommender-systems-introduction?format=HB.
[10] S. Alonso, J. Bobadilla, F. Ortega, R. Moya, Robust model-based reliability approach to
     tackle shilling attacks in collaborative filtering recommender systems, IEEE Access 7
     (2019) 41782–41798.
[11] M. P. O’Mahony, Towards robust and efficient automated collaborative filtering, Ph.D.
     thesis, Citeseer, 2004.
[12] S. K. Lam, J. Riedl, Shilling recommender systems for fun and profit, in: WWW, ACM,
     2004, pp. 393–402.
[13] R. D. Burke, B. Mobasher, R. Bhaumik, C. Williams, Segment-based injection attacks
     against collaborative filtering recommender systems, in: ICDM, IEEE Computer Society,
     2005, pp. 577–580.
[14] W. Zhou, J. Wen, Q. Xiong, M. Gao, J. Zeng, SVM-TIA a shilling attack detection method
     based on SVM and target item analysis in recommender systems, Neurocomputing 210
     (2016) 197–205.
[15] M. Aktukmak, Y. Yilmaz, I. Uysal, Quick and accurate attack detection in recommender
     systems through user attributes, in: RecSys, ACM, 2019, pp. 348–352.
[16] F. Zhang, Y. Lu, J. Chen, S. Liu, Z. Ling, Robust collaborative filtering based on non-negative
     matrix factorization and r1 -norm, Knowl.-Based Syst. 118 (2017) 177–190.
[17] G. Adomavicius, J. Zhang, Impact of data characteristics on recommender systems perfor-
     mance, ACM Trans. Management Inf. Syst. 3 (2012) 3:1–3:17.
[18] Y. Deldjoo, T. D. Noia, E. D. Sciascio, F. A. Merra, How dataset characteristics affect the
     robustness of collaborative recommendation models, in: SIGIR, ACM, 2020, pp. 951–960.
[19] B. Mobasher, R. Burke, R. Bhaumik, C. Williams, Toward trustworthy recommender
     systems: An analysis of attack models and algorithm robustness, ACM Transactions on
     Internet Technology (TOIT) 7 (2007).
[20] M. P. O’Mahony, N. J. Hurley, G. C. M. Silvestre, Recommender systems: Attack types and
     strategies, in: AAAI, 2005, pp. 334–339.
[21] M. P. O’Mahony, N. J. Hurley, G. C. Silvestre, An evaluation of the performance of
     collaborative filtering, in: 14th Irish Artificial Intelligence and Cognitive Science (AICS
     2003) Conference, Citeseer, 2003.
[22] M. P. O’Mahony, N. J. Hurley, N. Kushmerick, G. C. M. Silvestre, Collaborative recommen-
     dation: A robustness analysis, ACM Trans. Internet Techn. 4 (2004) 344–377.
[23] Y. Moshfeghi, B. Piwowarski, J. M. Jose, Handling data sparsity in collaborative filtering
     using emotion and semantic based features, in: Proceedings of the 34th international ACM
     SIGIR conference on Research and development in Information Retrieval, ACM, 2011, pp.
     625–634.
[24] P. Cremonesi, A. Tripodi, R. Turrin, Cross-domain recommender systems, in: ICDM
     Workshops, IEEE Computer Society, 2011, pp. 496–503.
[25] Y. Deldjoo, T. D. Noia, F. A. Merra, Assessing the impact of a user-item collaborative attack
     on class of users, in: ImpactRS@RecSys, volume 2462, CEUR-WS.org, 2019.
[26] J. L. Herlocker, J. A. Konstan, J. Riedl, Explaining collaborative filtering recommendations,
     in: CSCW 2000, Philadelphia, PA, USA, December 2-6, 2000, 2000, pp. 241–250. URL:
     https://doi.org/10.1145/358916.358995. doi:1 0 . 1 1 4 5 / 3 5 8 9 1 6 . 3 5 8 9 9 5 .
[27] C. C. Aggarwal, Recommender Systems - The Textbook, Springer, 2016. URL: https://doi.
     org/10.1007/978-3-319-29659-3. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 3 1 9 - 2 9 6 5 9 - 3 .
[28] F. M. Harper, J. A. Konstan, The movielens datasets: History and context, TiiS 5 (2016)
     19:1–19:19.
[29] X. He, Z. He, X. Du, T. Chua, Adversarial personalized ranking for recommendation, in:
     SIGIR, ACM, 2018, pp. 355–364.
[30] M. Schedl, The lfm-1b dataset for music retrieval and recommendation, in: ICMR, ACM,
     2016, pp. 103–110.
[31] N. Hug, Surprise, a Python library for recommender systems, 2017.
[32] Y. Koren, Factor in the neighbors: Scalable and accurate collaborative filtering, TKDD 4
     (2010) 1:1–1:24.
[33] Y. Koren, R. M. Bell, C. Volinsky, Matrix factorization techniques for recommender systems,
     IEEE Computer 42 (2009) 30–37.
[34] P. Cremonesi, Y. Koren, R. Turrin, Performance of recommender algorithms on top-n
     recommendation tasks, in: Proc. of the 2010 ACM Conference on Recommender Systems,
     RecSys 2010, 2010, pp. 39–46. URL: https://doi.org/10.1145/1864708.1864721. doi:1 0 . 1 1 4 5 /
     1864708.1864721.