Revisiting Neighbourhood-Based Recommenders For Temporal Scenarios Alejandro Bellogín Pablo Sánchez Universidad Autónoma de Madrid Universidad Autónoma de Madrid Madrid, Spain Madrid, Spain alejandro.bellogin@uam.es pablo.sanchezp@estudiante.uam.es ABSTRACT general [2] – to one where each neighbour provides a list of sugges- Modelling the temporal context efficiently and effectively is essen- tions for each user, which are later combined into a single ranking. tial to provide useful recommendations to users. Methods such Under this perspective, modelling the temporal aspect of user pref- as matrix factorisation and Markov chains have been combined erences is straightforward – as we shall show here – and provides recently to model the temporal preferences of users in a sequential an intuitive rationale about what is being recommended and why. basis. In this work, we focus on Neighbourhood-based Collabo- In the remaining of this paper, we will answer the following re- rative Filtering and propose a simple technique that incorporates search questions: (RQ1) Are neighbourhood-based recommenders interaction sequences when producing a personalised ranking. We competitive in temporal scenarios, especially when compared against show the efficiency of this method when compared against other methods based on matrix factorisation or Markov chains? (RQ2) sequence- and time-aware recommendation methods under two Is there any advantage in using a rank aggregation formulation classical temporal evaluation methodologies. for this problem? Furthermore, how can we incorporate temporal sequences into this formulation? After presenting our proposed method in detail in the next sec- 1 INTRODUCTION tion, we address the research questions experimentally on real in- Recommmender Systems have become a necessary tool for a large teractions from the Epinions website, using two evaluation method- number of online applications because of their ability to make ologies to derive the temporal split. As we shall see, the empirical personalised recommendations by adapting to user profiles. They results validate our approach, showing performance improvements are widely implemented in many online commercial platforms over state of the art memory-based alternatives and recent algo- like Amazon, Netflix, Youtube, etc. Each of them can use different rithms specifically tailored for sequential recommendation. approaches like collaborative filtering, content-based, and hybrid approaches. Since the purpose of these systems is to provide the best possible suggestions, different types of information can be added 2 INTEGRATING TEMPORAL SEQUENCES IN to the recommendations (location, type of product, time, ...), also NEIGHBOUR-BASED RECOMMENDERS known as contextual information. Among the different contexts, temporal information is one of the most interesting contexts to be Neighbourhood-based recommenders can be revisited as ranking integrated into the recommendation algorithms, due to its facility fusion algorithms where each neighbour contributes a ranking (of to be captured and because it usually discriminates better than potential relevant items for the target user) and the goal of the other dimensions [1]. Nonetheless, its formalisation in the area recommender system is to combine these rankings into one final has typically been proposed as heuristic filters, both in terms of output. In terms of Aggregated Search [8, 15], each neighbour would algorithmic approaches or evaluation strategies [4]. be denoted as a judge (in Information Retrieval these judges are One of the earliest and most popular collaborative filtering ap- usually different search engines) who gives a complete ordering proaches is the neighbourhood-based recommender, either user- of all the alternative items to be ranked; each of these rankings is based or item-based (in this article we will focus on the user-based denoted as τ , and the final fused ranking is τ̂ . Formally, the process variation). These approaches are normally represented as an aggre- of rank aggregation is divided into normalisation (where the scores gation function of the ratings from the k most similar users over or the ranks of τ are normalised into a common scale, w τ (i), for item i; usually, this aggregation function is represented as [6]: each item i) and combination (where the normalised weights w τ (i) are combined into one fused score). r w P v ∈N (u ) vi uv There are several methods for each of these stages, see [15] for rˆui = P i (1) an in-depth review of the most prominent ones. Interestingly, by |w | v ∈N (u ) uv i taking the identity normaliser for the scores (w τ (i) = τ (i)) and where wuv is the weight (or similarity) between users u and v the so-called CombSUM combiner (where the normalised weights and Ni (u) represents user’s u neighbours that have rated item i. are simply added for each item) with a preference weight for each Different normalisation functions can be applied to this formula, ranking equals to the similarity between the neighbour and the tar- like mean centering or Z-score [11, 18]. get user, we obtain a linear combination of the normalised weights, In this work, we generalise this classical formulation – borrow- which is equivalent to the classical formulation of a neighbourhood- ing ideas from Aggregated Search and Information Retrieval in based recommender. In fact, when we take into account the ratings of the neighbours, the “score” of user u to item i using CombSum RecTemp ’17, August 2017, Como, Italy and the identity normaliser produces the numerator of Equation 1. Copyright © 2017 for this paper by its authors. Copying permitted for private and In this situation, each ranking τ is composed of the item-rating pairs academic purposes. rated by a particular neighbour, excluding, as a standard practice in the community, those items already rated by the target user in train- ing. Further extensions and ad-hoc modifications could be made to these normalisers and combiners so that other formulations of this problem – such as mean-centering or Z-score normalisation [6] – can be achieved. Once we have reformulated the problem of neighbourhood-based recommendation as a ranking fusion technique, we now describe how we can incorporate temporal information in the process. The main idea is that each neighbour will find which is her last common interaction with the target user and will create a ranking of her candidate alternatives iterating around that item, taking into account the order in which she rated each of those alternatives. Although in this case we are not taking into account the actual moment of the interaction (i.e., we can end up Figure 1: Example of user interactions in the movie domain. recommending items from a neighbour whose last common item The yellow border corresponds to the last common interac- was rated long time ago), we can easily improve this approach by tion between u and each neighbour, the red border repre- filtering neighbours whose last common item was rated before a sents those movies included in L−2 (v), and the green border certain threshold date; in this paper we will not explore this option those in L+2 (v). and leave it as future work. Note that the temporal aspect is considered twice in this model: it In summary, when using this formalisation, we obtain a model is used to involve the target user (through the last common interac- equivalent to classical formulations that can further incorporate tion) in setting the actual moment (context) of the recommendation the temporal information under different models. and, at the same time, to exploit the actual (temporal) order in Finally, let us illustrate the whole process with an example shown which the neighbour interacted with the items. In the following, in Figure 1 using the movie domain. For the sake of simplicity, we we present our model, consisting in a method to compute the last do not include the user’s rating, so the reader should assume that common interaction and different strategies to exploit the order of all sequences are composed of articles that the user has equally the neighbour ratings. liked. In the case of movies, the temporal component is usually The last common interaction between two users u and v is: determinant, since newer movies tend to be consumed more often n ∗ (u; v) = max i k ∈ Iut : i k ∈ Ivt than older ones. In the presented example, user u is the user to   (2) k whom we want to make the recommendations, and we represent where Iut are the items rated by user u ordered by timestamp in three neighbours v 1 , v 2 , and v 3 , where v 1 and v 3 have 3 items in ascending order (recent interactions appear later in the list), that is: common whereas v 2 shares 4 items with the target user. According   | Iu | to these interactions, the candidate items generated with respect to Iut = sort (Iu , t ) = i kt , with t i kt < t i kt +1     (3) the different strategies presented before (limited to size 2) will be k =1 Note that the last common interaction n ∗ will not be symmetrical (considering that n∗ (v 1 ; u) = i 9 , n ∗ (v 2 ; u) = i 10 , n ∗ (v 3 ; u) = i 7 ): in general – that is, n ∗ (u; v) , n ∗ (v; u) – since it makes reference to the preferences of the first user. L+2 (v 1 ) = (i 14 , i 13 ), L−2 (v 1 ) = (i 6 , i 2 ), L±1,1 (v 1 ) = (i 14 , i 6 ) Once we have sorted the preferences by timestamp (Iut and Ivt ) L+2 (v 2 ) = (i 12 , i 13 ), L−2 (v 2 ) = (i 2 ), L±1,1 (v 2 ) = (i 12 , i 2 ) and calculated the last common interactions (n ∗ (u; v) and n∗ (v; u)) for target user u and neighbour v, we propose three strategies L+2 (v 3 ) = (i 12 , i 15 ), L−2 (v 3 ) = (i 5 , i 6 ), L±1,1 (v 3 ) = (i 12 , i 5 ) to build the lists with candidate items from each neighbour: (a) taking the most recent m items rated by the neighbour after the last common interaction (we denote this list as Lm + (v) and the Let us now consider that items i 12 , i 13 and i 14 are in the test set (as strategy as forward or F), (b) taking the most recent m items rated mentioned before, newer films are more likely to be chosen by user before the last common interaction (Lm − (v), backward or B), and u). A standard neighbourhood-based recommender which does not (c) concatenating the m 1 items rated before and the m 2 items rated take the temporal aspect into account would probably recommend after the last common interaction (Lm ± (v), backward-forward item i 2 whereas this item only appear in our approach once for 1,m 2 or BF). More specifically, these lists are generated as follows: strategy L± and twice for L− , mostly in favour of the more recent movie i 12 . Moreover, we believe that moving forward from the last Let I t (v; u) = sort (Iv − Iu , t ) common interaction is more useful in terms of recommendation +   n ∗ +m performance – especially for novelty purposes –; this is evidenced Lm (v) = i kt ∗ , i kt ∈ I t (v; u) (4) by the strategies L+ and L± that recommend i 13 and i 14 . n   n∗ Lm− (v) = i kt ∗ , i kt ∈ I t (v; u) (5)  n −m 3 EMPIRICAL EVALUATION +  ± − Lm1,m2 (v) = Lm 1 (v), Lm 2 (v) (6) 3.1 Evaluation Methodology Therefore, for each neighbour we obtain a list L(v) with all the We test the proposed approach on a dataset collected from Epin- candidate items from that neighbour, which will be later normalised ions.com by the authors of [21], also used in [10]. It includes all and combined, as explained before, to produce a single ranking, actions of all users on the website spanning January 2001 to No- containing the recommendations for the target user. vember 2013, for a total of 193, 571 actions on 42, 447 items by 117, 323 users; it hence has a density of 0.004%.1 This dataset fits chain. As an extension to this method, we include Factorised Per- naturally the purpose of exploiting the temporal dimension of the sonalized Markov Chains (FPMC), a method that combines Matrix user preferences, since it represents an unbiased sample of the web- Factorisation and first-order Markov Chains [17]. Finally, we also site; other datasets more common in the area – like MovieLens – are include as a baseline the Fossil method introduced in [10] (Fac- not well-suited because they have been filtered or their timestamps torised Sequential Prediction with Item Similarity Models), where are artificial [9]. Markov chains are combined with a similarity-based algorithm. As described in [4], there are several evaluation conditions worth We compared these baselines against different instantiations of of exploration when evaluating time-aware recommender systems. the rank aggregation formulation presented in Section 2. For the In this work, we use a time-dependent rating order (the timestamps sake of space, we only report results when the backward-forward of the test split for each user occur after those of the training split) (BF) strategy to build the lists with candidate items is used, mostly in two evaluation methodologies: one with a user-centered base due to its better performance with respect to the backward and set and a fixed size condition (the last 2 actions of each user with forward strategies. We do experiment with a variation where the at least 4 actions are included in the test split) and another with a similarity is used to weight the contribution of each neighbour community-centered base set and a proportion-based size condition (as in standard user-based CF) and denote it as BFwCF; when no (the same timestamp is used for all the users, in such a way that weight is used in the combination step we denote it as BFuCF. we retain the data corresponding to the 80% of the most recent Unless stated otherwise, we use 100 neighbours in KNN and TD, rating times for training, and the rest for testing). We name the a λ factor of 1/200 in TD, and L = 1, K = 10, λ Θ = 0.1, and α = 0.2 first configuration as Fix and the second as CC. There are obvious in FMC, FPMC, and Fossil, as specified in the original paper [10] for differences between these two evaluation methodologies: whereas this dataset; that is, we have not performed an exhaustive search in CC the test set is always (for every user) after the training set, to find the optimal parameters of these algorithms. in Fix this may not be the case; besides, (almost) every user will be included in the test set of Fix and this will not be the case in 3.3 Results and Discussion CC. In other terms, the CC methodology represents better a real environment, where there are some users that may not be active As described previously, we test our approaches under two evalua- at some point, whereas with the Fix evaluation we can analyse the tion conditions that consider differently the temporal dimension recommendations for all the users, not only the most active (or when splitting the dataset into training and test. As shown in Ta- active in the last period of time) ones. ble 1, the CC methodology is slightly more difficult than Fix, as Using the terminology in [19], we report the results obtained evidenced by the lower values obtained in most of the metrics by following the TrainItems strategy to select the candidate items to the baseline algorithms. This observation is in line with previous re- be ranked by each algorithm; that is, a ranking is generated for sults in the area [4]. It should be noted that CC replicates a situation each user by predicting a score for every item that has a rating in closer to what we would find in an online experiment, where the the training set. We then compute standard Information Retrieval test split is set in the future no matter the user we are considering; metrics on the ranking, considering as relevant every item rated this is not true in Fix, where the test split of each user could exist with a 5 in the test split. We report here the values for precision and at different timestamps, but on the other hand, every user contains recall at 5 and 50, and nDCG at 5 and 10. We also report the user the same number of test interactions, which may decrease the bias space coverage metric (cvg) as defined in [20], that is, the number of towards users more active in the most recent portion of the dataset. users for which the system is able to recommend at least one item. We now assess the research questions RQ1 (Are neighbourhood- Complete code and evaluation scripts can be found in the following based recommenders competitive in temporal scenarios, especially Bitbucket repository: PabloSanchezP/BFRecommendation. when compared against methods based on matrix factorisation or Markov chains?) and RQ2 (Is there any advantage in using a rank 3.2 Recommendation algorithms aggregation formulation for this problem?) raised at the beginning of the paper, in light of the results summarised in Table 1. To ad- We compare our methods against different well-known state-of- dress RQ1 we compare the performance of FMC, FPMC, and Fossil the-art recommenders. We report a popularity-based recommender (combinations of Markov chains and matrix factorisation) against (ItemPop) that recommends items based on their popularity in the the KNN baseline. We observe that, in contrast to the original pa- system. We also include a classical nearest-neighbour recommender per [10], Fossil is not always the best performing method amongst optimised for ranking (that is, without normalisation, as proposed these baselines (this only holds when using the CC methodology). in [5]) using the Jaccard coefficient as similarity (KNN). A modifica- Actually, the results obtained for these methods are worse than tion of this algorithm is also tested including an exponential time KNN in both methodologies. We argue that a possible reason for decay weight as introduced in [7] (TD). These three algorithms this inconsistency with respect to the previous reported results is were based on the implementation found in the RankSys2 library. that here we evaluate using a more common setting in the area Additionally, we include some purely sequential-based algo- (top-N recommendation) and not AUC, which these algorithms rithms as a comparison with the results reported in [10] (we use the are optimised for. Furthermore, in [10] no classical recommender same implementation as the one used in that paper). In the model algorithms like KNN appear in their comparison. named FMC (Factorised Markov Chains) the item-to-item transition Furthermore, it is interesting to note that a time decay modifica- matrix is factorised to capture the likelihood that an arbitrary user tion of KNN does not improve under this setting unless many items transitions from one item to another, using a first-order Markov are considered in the ranking, since TD outperforms KNN only 1 Note these statistics do not match those from [10] or [21] because we do not put any for Precision@50 and Recall@50. In any case, it seems that basic constraint on the minimum number of ratings on users and items. KNN algorithms are competitive against state-of-the-art algorithms 2 http://ranksys.org specifically designed to address the sequential recommendation Table 1: Summary of comparative effectiveness, including improvement (∆) in terms of nDCG@5 with respect to two of the baselines. Best values for each evaluation methodology and metric are in bold. (a) CC methodology Method Precision@5 nDCG@5 Recall@5 nDCG@10 Precision@50 Recall@50 cvg ∆ wrt KNN ∆ wrt Fossil ItemPop 1.81E-04 8.89E-04 2.25E-03 1.21E-03 3.80E-04 4.69E-02 100.00% -144.08% -36.88% KNN 2.29E-04 2.17E-03 2.17E-03 2.94E-03 4.59E-05 4.34E-03 100.00% – 43.92% TD 2.29E-04 2.17E-03 2.17E-03 2.17E-03 6.88E-05 6.51E-03 100.00% 0.00% 43.92% FMC 0.00E+00 0.00E+00 0.00E+00 4.49E-04 2.69E-05 1.22E-03 85.21% NA NA FPMC 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.69E-05 1.22E-03 85.21% NA NA Fossil 2.69E-04 1.22E-03 2.43E-03 1.22E-03 2.69E-05 2.43E-03 85.21% -78.31% – BFuCF 2.29E-04 2.17E-03 2.17E-03 2.17E-03 6.88E-05 4.49E-03 100.00% 0.00% 43.92% BFwCF 4.59E-04 3.10E-03 4.34E-03 3.10E-03 9.17E-05 6.66E-03 100.00% 30.10% 60.80% (b) Fix methodology Method Precision@5 nDCG@5 Recall@5 nDCG@10 Precision@50 Recall@50 cvg ∆ wrt KNN ∆ wrt Fossil ItemPop 3.32E-04 1.28E-03 1.74E-03 2.12E-03 4.06E-04 2.19E-02 100.00% -200.02% -5.48% KNN 1.05E-03 3.84E-03 5.56E-03 4.94E-03 3.83E-04 2.05E-02 97.20% – 64.84% TD 3.15E-04 1.09E-03 1.99E-03 1.62E-03 2.97E-04 1.68E-02 97.20% -252.10% -23.79% FMC 4.34E-04 1.39E-03 2.42E-03 2.27E-03 2.91E-04 1.57E-02 100.00% -176.32% 2.85% FPMC 4.08E-04 1.08E-03 1.93E-03 1.67E-03 2.20E-04 1.10E-02 100.00% -255.14% -24.86% Fossil 3.32E-04 1.35E-03 1.64E-03 2.28E-03 2.60E-04 1.38E-02 100.00% -184.43% – BFuCF 1.05E-03 3.89E-03 5.56E-03 4.75E-03 3.62E-04 1.96E-02 97.20% 1.39% 65.33% BFwCF 9.46E-04 3.48E-03 4.96E-03 4.65E-03 3.55E-04 1.91E-02 97.20% -10.50% 61.15% problem, even beating the ItemPop recommender, which is fre- is the same for every user, even though the similarity used (Jaccard) quently a very strong baseline due to the inherent popularity bias does not take the temporal dimension into account. found in this type of systems [12]. We now focus on the models generated using the rank aggre- 4 CONCLUSIONS AND FUTURE WORK gation formulation to address RQ2. We observe that these models (BFuCF and BFwCF) show very positive results in both evaluation In this paper we have presented a new formulation for neighbourhood- methodologies. In fact, we experimented with several instantiations based recommendation that allows to integrate the temporal di- of the framework described in Section 2. We found that the best mension seamlessly and successfully, according to the reported normalisation method is the identity normaliser, since a rank-based experiments. The two research questions proposed have been an- approach or the standard normaliser [15] produces worse results swered positively, evidencing that this type of recommendation (not reported here because of space constraints). Our preliminary algorithms is competitive in temporal scenarios, outperforming results also showed that the best strategy to select the candidate recent state-of-the-art algorithms specifically tailored to sequential- items is generating lists as Lm ± ; because of this we report in based recommendation. Furthermore, when formulating the rec- 1,m 2 Table 1 results for this strategy using m 1 = m 2 = 5, Jaccard as simi- ommendation problem as an aggregation of several rankings and larity, and 100 neighbours, so the comparison is as fair as possible introducing the temporal dimension in the process, the performance with respect to the rest of the baselines, even though an exhaustive clearly improves, up to a 30% with respect to another neighbour- tuning of these parameters may achieve better performance values. based recommender without using the temporal component and When the proposed approaches are used, a high improvement is up to a 65% with respect to a sequential-based baseline. achieved with respect to both KNN and Fossil baselines; however, Since the framework introduced in this work is general enough depending on the actual evaluation methodology our approach may to work with other aggregation functions, in the future we plan to actually degrade its performance (see Fix methodology). In these explore the behaviour of our proposal when alternative aggregation cases, the coverage is limited by the coverage of the user similarity, functions – such as those based on the score distribution [14] – are which results in the same coverage as that obtained for KNN and TD used. Furthermore, an exhaustive analysis – with more datasets, algorithms. It is interesting to observe that the largest improvement baselines such as SVD++ [13] and BPR for implicit data [16], and is obtained for the more realistic scenario, that is, the CC methodol- evaluation methodologies – should be made to better understand ogy. Regarding the difference between similarity-weighted (BFwCF) each component of the proposed model, for instance, the number and unweighted (BFuCF) versions of these methods, the conclu- of items allowed to be selected after and before the last common in- sions are not clear, since this parameter seems to depend on how teraction and the impact of the similarity when weighting the final the split was performed. We hypothesise that the similarity values result. An important aspect that deserves further research is the in the CC methodology are more meaningful because the timeline definition of sequence-aware similarity metrics [3], so that the tem- poral dimension can be considered when selecting the neighbours in the proposed approach. Acknowledgments https://doi.org/10.1145/312624.312682 [12] Dietmar Jannach, Lukas Lerche, Iman Kamehkhosh, and Michael Jugovac. 2015. This work was funded by the national Spanish Government under What recommenders recommend: an analysis of recommendation biases and project TIN2016-80630-P. possible countermeasures. User Model. User-Adapt. Interact. 25, 5 (2015), 427–491. DOI:https://doi.org/10.1007/s11257-015-9165-3 [13] Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted REFERENCES collaborative filtering model. In KDD ’08. ACM, 426–434. DOI:https://doi.org/10. [1] Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-Aware Rec- 1145/1401890.1401944 ommender Systems. In Recommender Systems Handbook, Francesco Ricci, Lior [14] R. Manmatha, Toni M. Rath, and Fangfang Feng. 2001. Modeling Score Distribu- Rokach, and Bracha Shapira (Eds.). Springer, 191–226. DOI:https://doi.org/10. tions for Combining the Outputs of Search Engines. In SIGIR 2001: Proceedings of 1007/978-1-4899-7637-6_6 the 24th Annual International ACM SIGIR Conference on Research and Develop- [2] Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information ment in Information Retrieval, September 9-13, 2001, New Orleans, Louisiana, USA, Retrieval - the concepts and technology behind search, Second edition. Pearson W. Bruce Croft, David J. Harper, Donald H. Kraft, and Justin Zobel (Eds.). ACM, Education Ltd., Harlow, England. 267–275. DOI:https://doi.org/10.1145/383952.384005 [3] Alejandro Bellogín and Pablo Sánchez. 2017. Collaborative Filtering based on [15] M. Elena Renda and Umberto Straccia. 2003. Web Metasearch: Rank vs. Score Subsequence Matching: A New Approach. Submitted to Information Sciences Based Rank Aggregation Methods. In Proceedings of the 2003 ACM Symposium on (2017). Applied Computing (SAC), March 9-12, 2003, Melbourne, FL, USA, Gary B. Lamont, [4] Pedro G. Campos, Fernando Díez, and Iván Cantador. 2014. Time-aware recom- Hisham Haddad, George A. Papadopoulos, and Brajendra Panda (Eds.). ACM, mender systems: a comprehensive survey and analysis of existing evaluation 841–846. DOI:https://doi.org/10.1145/952532.952698 protocols. User Model. User-Adapt. Interact. 24, 1-2 (2014), 67–119. [16] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. [5] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009, recommender algorithms on top-n recommendation tasks. In RecSys. ACM, 39– Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, 46. Montreal, QC, Canada, June 18-21, 2009, Jeff A. Bilmes and Andrew Y. Ng (Eds.). [6] Christian Desrosiers and George Karypis. 2011. A Comprehensive Survey of AUAI Press, 452–461. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1& Neighborhood-based Recommendation Methods. In Recommender Systems Hand- smnu=2&article_id=1630&proceeding_id=25 book. 107–144. [17] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factor- [7] Yi Ding and Xue Li. 2005. Time weight collaborative filtering. In CIKM. ACM, izing personalized Markov chains for next-basket recommendation. In WWW. 485–492. ACM, 811–820. [8] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. 2001. Rank aggrega- [18] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John tion methods for the Web. In Proceedings of the Tenth International World Wide Riedl. 1994. GroupLens: An Open Architecture for Collaborative Filtering of Web Conference, WWW 10, Hong Kong, China, May 1-5, 2001, Vincent Y. Shen, Netnews. In CSCW ’94, Proceedings of the Conference on Computer Supported Nobuo Saito, Michael R. Lyu, and Mary Ellen Zurko (Eds.). ACM, 613–622. DOI: Cooperative Work, Chapel Hill, NC, USA, October 22-26, 1994, John B. Smith, https://doi.org/10.1145/371920.372165 F. Donelson Smith, and Thomas W. Malone (Eds.). ACM, 175–186. DOI:https: [9] F. Maxwell Harper and Joseph A. Konstan. 2016. The MovieLens Datasets: History //doi.org/10.1145/192844.192905 and Context. TiiS 5, 4 (2016), 19:1–19:19. DOI:https://doi.org/10.1145/2827872 [19] Alan Said and Alejandro Bellogín. 2014. Comparative recommender system [10] Ruining He and Julian McAuley. 2016. Fusing Similarity Models with Markov evaluation: benchmarking recommendation frameworks. In RecSys. ACM, 129– Chains for Sparse Sequential Recommendation. In ICDM. IEEE, 191–200. 136. [11] Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. 1999. [20] Guy Shani and Asela Gunawardana. 2011. Evaluating Recommendation Systems. An Algorithmic Framework for Performing Collaborative Filtering. In SIGIR ’99: In Recommender Systems Handbook. 257–297. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research [21] Tong Zhao, Julian J. McAuley, and Irwin King. 2014. Leveraging Social Con- and Development in Information Retrieval, August 15-19, 1999, Berkeley, CA, USA, nections to Improve Personalized Ranking for Collaborative Filtering. In CIKM. Fredric C. Gey, Marti A. Hearst, and Richard M. Tong (Eds.). ACM, 230–237. DOI: ACM, 261–270.