=Paper= {{Paper |id=Vol-1922/paper8 |storemode=property |title=Revisiting Neighbourhood-Based Recommenders For Temporal Scenarios |pdfUrl=https://ceur-ws.org/Vol-1922/paper8.pdf |volume=Vol-1922 |authors=Alejandro Bellogín,Pablo Sánchez |dblpUrl=https://dblp.org/rec/conf/recsys/BelloginS17 }} ==Revisiting Neighbourhood-Based Recommenders For Temporal Scenarios== https://ceur-ws.org/Vol-1922/paper8.pdf

Revisiting Neighbourhood-Based Recommenders
For Temporal Scenarios
Alejandro Bellogín Pablo Sánchez
Universidad Autónoma de Madrid Universidad Autónoma de Madrid
Madrid, Spain Madrid, Spain
alejandro.bellogin@uam.es pablo.sanchezp@estudiante.uam.es
ABSTRACT general [2] – to one where each neighbour provides a list of sugges-
Modelling the temporal context efficiently and effectively is essen- tions for each user, which are later combined into a single ranking.
tial to provide useful recommendations to users. Methods such Under this perspective, modelling the temporal aspect of user pref-
as matrix factorisation and Markov chains have been combined erences is straightforward – as we shall show here – and provides
recently to model the temporal preferences of users in a sequential an intuitive rationale about what is being recommended and why.
basis. In this work, we focus on Neighbourhood-based Collabo- In the remaining of this paper, we will answer the following re-
rative Filtering and propose a simple technique that incorporates search questions: (RQ1) Are neighbourhood-based recommenders
interaction sequences when producing a personalised ranking. We competitive in temporal scenarios, especially when compared against
show the efficiency of this method when compared against other methods based on matrix factorisation or Markov chains? (RQ2)
sequence- and time-aware recommendation methods under two Is there any advantage in using a rank aggregation formulation
classical temporal evaluation methodologies. for this problem? Furthermore, how can we incorporate temporal
sequences into this formulation?
After presenting our proposed method in detail in the next sec-
1 INTRODUCTION tion, we address the research questions experimentally on real in-
Recommmender Systems have become a necessary tool for a large teractions from the Epinions website, using two evaluation method-
number of online applications because of their ability to make ologies to derive the temporal split. As we shall see, the empirical
personalised recommendations by adapting to user profiles. They results validate our approach, showing performance improvements
are widely implemented in many online commercial platforms over state of the art memory-based alternatives and recent algo-
like Amazon, Netflix, Youtube, etc. Each of them can use different rithms specifically tailored for sequential recommendation.
approaches like collaborative filtering, content-based, and hybrid
approaches. Since the purpose of these systems is to provide the best
possible suggestions, different types of information can be added 2 INTEGRATING TEMPORAL SEQUENCES IN
to the recommendations (location, type of product, time, ...), also NEIGHBOUR-BASED RECOMMENDERS
known as contextual information. Among the different contexts,
temporal information is one of the most interesting contexts to be Neighbourhood-based recommenders can be revisited as ranking
integrated into the recommendation algorithms, due to its facility fusion algorithms where each neighbour contributes a ranking (of
to be captured and because it usually discriminates better than potential relevant items for the target user) and the goal of the
other dimensions [1]. Nonetheless, its formalisation in the area recommender system is to combine these rankings into one final
has typically been proposed as heuristic filters, both in terms of output. In terms of Aggregated Search [8, 15], each neighbour would
algorithmic approaches or evaluation strategies [4]. be denoted as a judge (in Information Retrieval these judges are
One of the earliest and most popular collaborative filtering ap- usually different search engines) who gives a complete ordering
proaches is the neighbourhood-based recommender, either user- of all the alternative items to be ranked; each of these rankings is
based or item-based (in this article we will focus on the user-based denoted as τ , and the final fused ranking is τ̂ . Formally, the process
variation). These approaches are normally represented as an aggre- of rank aggregation is divided into normalisation (where the scores
gation function of the ratings from the k most similar users over or the ranks of τ are normalised into a common scale, w τ (i), for
item i; usually, this aggregation function is represented as [6]: each item i) and combination (where the normalised weights w τ (i)
are combined into one fused score).
r w
P
v ∈N (u ) vi uv There are several methods for each of these stages, see [15] for
rˆui = P i (1) an in-depth review of the most prominent ones. Interestingly, by
|w |
v ∈N (u ) uv
i
taking the identity normaliser for the scores (w τ (i) = τ (i)) and
where wuv is the weight (or similarity) between users u and v the so-called CombSUM combiner (where the normalised weights
and Ni (u) represents user’s u neighbours that have rated item i. are simply added for each item) with a preference weight for each
Different normalisation functions can be applied to this formula, ranking equals to the similarity between the neighbour and the tar-
like mean centering or Z-score [11, 18]. get user, we obtain a linear combination of the normalised weights,
In this work, we generalise this classical formulation – borrow- which is equivalent to the classical formulation of a neighbourhood-
ing ideas from Aggregated Search and Information Retrieval in based recommender. In fact, when we take into account the ratings
of the neighbours, the “score” of user u to item i using CombSum
RecTemp ’17, August 2017, Como, Italy
and the identity normaliser produces the numerator of Equation 1.
Copyright © 2017 for this paper by its authors. Copying permitted for private and In this situation, each ranking τ is composed of the item-rating pairs
academic purposes. rated by a particular neighbour, excluding, as a standard practice in
the community, those items already rated by the target user in train-
ing. Further extensions and ad-hoc modifications could be made to
these normalisers and combiners so that other formulations of this
problem – such as mean-centering or Z-score normalisation [6] –
can be achieved.
Once we have reformulated the problem of neighbourhood-based
recommendation as a ranking fusion technique, we now describe
how we can incorporate temporal information in the process. The
main idea is that each neighbour will find which is her last
common interaction with the target user and will create a
ranking of her candidate alternatives iterating around that
item, taking into account the order in which she rated each of
those alternatives. Although in this case we are not taking into
account the actual moment of the interaction (i.e., we can end up
Figure 1: Example of user interactions in the movie domain.
recommending items from a neighbour whose last common item
The yellow border corresponds to the last common interac-
was rated long time ago), we can easily improve this approach by
tion between u and each neighbour, the red border repre-
filtering neighbours whose last common item was rated before a
sents those movies included in L−2 (v), and the green border
certain threshold date; in this paper we will not explore this option
those in L+2 (v).
and leave it as future work.
Note that the temporal aspect is considered twice in this model: it In summary, when using this formalisation, we obtain a model
is used to involve the target user (through the last common interac- equivalent to classical formulations that can further incorporate
tion) in setting the actual moment (context) of the recommendation the temporal information under different models.
and, at the same time, to exploit the actual (temporal) order in Finally, let us illustrate the whole process with an example shown
which the neighbour interacted with the items. In the following, in Figure 1 using the movie domain. For the sake of simplicity, we
we present our model, consisting in a method to compute the last do not include the user’s rating, so the reader should assume that
common interaction and different strategies to exploit the order of all sequences are composed of articles that the user has equally
the neighbour ratings. liked. In the case of movies, the temporal component is usually
The last common interaction between two users u and v is: determinant, since newer movies tend to be consumed more often
n ∗ (u; v) = max i k ∈ Iut : i k ∈ Ivt than older ones. In the presented example, user u is the user to

(2)
k whom we want to make the recommendations, and we represent
where Iut are the items rated by user u ordered by timestamp in three neighbours v 1 , v 2 , and v 3 , where v 1 and v 3 have 3 items in
ascending order (recent interactions appear later in the list), that is: common whereas v 2 shares 4 items with the target user. According
| Iu | to these interactions, the candidate items generated with respect to
Iut = sort (Iu , t ) = i kt , with t i kt < t i kt +1

(3) the different strategies presented before (limited to size 2) will be
k =1
Note that the last common interaction n ∗ will not be symmetrical (considering that n∗ (v 1 ; u) = i 9 , n ∗ (v 2 ; u) = i 10 , n ∗ (v 3 ; u) = i 7 ):
in general – that is, n ∗ (u; v) , n ∗ (v; u) – since it makes reference
to the preferences of the first user. L+2 (v 1 ) = (i 14 , i 13 ), L−2 (v 1 ) = (i 6 , i 2 ), L±1,1 (v 1 ) = (i 14 , i 6 )
Once we have sorted the preferences by timestamp (Iut and Ivt )
L+2 (v 2 ) = (i 12 , i 13 ), L−2 (v 2 ) = (i 2 ), L±1,1 (v 2 ) = (i 12 , i 2 )
and calculated the last common interactions (n ∗ (u; v) and n∗ (v; u))
for target user u and neighbour v, we propose three strategies L+2 (v 3 ) = (i 12 , i 15 ), L−2 (v 3 ) = (i 5 , i 6 ), L±1,1 (v 3 ) = (i 12 , i 5 )
to build the lists with candidate items from each neighbour: (a)
taking the most recent m items rated by the neighbour after the
last common interaction (we denote this list as Lm + (v) and the Let us now consider that items i 12 , i 13 and i 14 are in the test set (as
strategy as forward or F), (b) taking the most recent m items rated mentioned before, newer films are more likely to be chosen by user
before the last common interaction (Lm − (v), backward or B), and u). A standard neighbourhood-based recommender which does not
(c) concatenating the m 1 items rated before and the m 2 items rated take the temporal aspect into account would probably recommend
after the last common interaction (Lm ± (v), backward-forward item i 2 whereas this item only appear in our approach once for
1,m 2
or BF). More specifically, these lists are generated as follows: strategy L± and twice for L− , mostly in favour of the more recent
movie i 12 . Moreover, we believe that moving forward from the last
Let I t (v; u) = sort (Iv − Iu , t ) common interaction is more useful in terms of recommendation
+
n ∗ +m performance – especially for novelty purposes –; this is evidenced
Lm (v) = i kt ∗ , i kt ∈ I t (v; u) (4) by the strategies L+ and L± that recommend i 13 and i 14 .
n
n∗
Lm−
(v) = i kt ∗ , i kt ∈ I t (v; u) (5)
n −m 3 EMPIRICAL EVALUATION
+

± −
Lm1,m2 (v) = Lm 1
(v), Lm 2
(v) (6) 3.1 Evaluation Methodology
Therefore, for each neighbour we obtain a list L(v) with all the We test the proposed approach on a dataset collected from Epin-
candidate items from that neighbour, which will be later normalised ions.com by the authors of [21], also used in [10]. It includes all
and combined, as explained before, to produce a single ranking, actions of all users on the website spanning January 2001 to No-
containing the recommendations for the target user. vember 2013, for a total of 193, 571 actions on 42, 447 items by
117, 323 users; it hence has a density of 0.004%.1 This dataset fits chain. As an extension to this method, we include Factorised Per-
naturally the purpose of exploiting the temporal dimension of the sonalized Markov Chains (FPMC), a method that combines Matrix
user preferences, since it represents an unbiased sample of the web- Factorisation and first-order Markov Chains [17]. Finally, we also
site; other datasets more common in the area – like MovieLens – are include as a baseline the Fossil method introduced in [10] (Fac-
not well-suited because they have been filtered or their timestamps torised Sequential Prediction with Item Similarity Models), where
are artificial [9]. Markov chains are combined with a similarity-based algorithm.
As described in [4], there are several evaluation conditions worth We compared these baselines against different instantiations of
of exploration when evaluating time-aware recommender systems. the rank aggregation formulation presented in Section 2. For the
In this work, we use a time-dependent rating order (the timestamps sake of space, we only report results when the backward-forward
of the test split for each user occur after those of the training split) (BF) strategy to build the lists with candidate items is used, mostly
in two evaluation methodologies: one with a user-centered base due to its better performance with respect to the backward and
set and a fixed size condition (the last 2 actions of each user with forward strategies. We do experiment with a variation where the
at least 4 actions are included in the test split) and another with a similarity is used to weight the contribution of each neighbour
community-centered base set and a proportion-based size condition (as in standard user-based CF) and denote it as BFwCF; when no
(the same timestamp is used for all the users, in such a way that weight is used in the combination step we denote it as BFuCF.
we retain the data corresponding to the 80% of the most recent Unless stated otherwise, we use 100 neighbours in KNN and TD,
rating times for training, and the rest for testing). We name the a λ factor of 1/200 in TD, and L = 1, K = 10, λ Θ = 0.1, and α = 0.2
first configuration as Fix and the second as CC. There are obvious in FMC, FPMC, and Fossil, as specified in the original paper [10] for
differences between these two evaluation methodologies: whereas this dataset; that is, we have not performed an exhaustive search
in CC the test set is always (for every user) after the training set, to find the optimal parameters of these algorithms.
in Fix this may not be the case; besides, (almost) every user will
be included in the test set of Fix and this will not be the case in
3.3 Results and Discussion
CC. In other terms, the CC methodology represents better a real
environment, where there are some users that may not be active As described previously, we test our approaches under two evalua-
at some point, whereas with the Fix evaluation we can analyse the tion conditions that consider differently the temporal dimension
recommendations for all the users, not only the most active (or when splitting the dataset into training and test. As shown in Ta-
active in the last period of time) ones. ble 1, the CC methodology is slightly more difficult than Fix, as
Using the terminology in [19], we report the results obtained evidenced by the lower values obtained in most of the metrics by
following the TrainItems strategy to select the candidate items to the baseline algorithms. This observation is in line with previous re-
be ranked by each algorithm; that is, a ranking is generated for sults in the area [4]. It should be noted that CC replicates a situation
each user by predicting a score for every item that has a rating in closer to what we would find in an online experiment, where the
the training set. We then compute standard Information Retrieval test split is set in the future no matter the user we are considering;
metrics on the ranking, considering as relevant every item rated this is not true in Fix, where the test split of each user could exist
with a 5 in the test split. We report here the values for precision and at different timestamps, but on the other hand, every user contains
recall at 5 and 50, and nDCG at 5 and 10. We also report the user the same number of test interactions, which may decrease the bias
space coverage metric (cvg) as defined in [20], that is, the number of towards users more active in the most recent portion of the dataset.
users for which the system is able to recommend at least one item. We now assess the research questions RQ1 (Are neighbourhood-
Complete code and evaluation scripts can be found in the following based recommenders competitive in temporal scenarios, especially
Bitbucket repository: PabloSanchezP/BFRecommendation. when compared against methods based on matrix factorisation or
Markov chains?) and RQ2 (Is there any advantage in using a rank
3.2 Recommendation algorithms aggregation formulation for this problem?) raised at the beginning
of the paper, in light of the results summarised in Table 1. To ad-
We compare our methods against different well-known state-of- dress RQ1 we compare the performance of FMC, FPMC, and Fossil
the-art recommenders. We report a popularity-based recommender (combinations of Markov chains and matrix factorisation) against
(ItemPop) that recommends items based on their popularity in the the KNN baseline. We observe that, in contrast to the original pa-
system. We also include a classical nearest-neighbour recommender per [10], Fossil is not always the best performing method amongst
optimised for ranking (that is, without normalisation, as proposed these baselines (this only holds when using the CC methodology).
in [5]) using the Jaccard coefficient as similarity (KNN). A modifica- Actually, the results obtained for these methods are worse than
tion of this algorithm is also tested including an exponential time KNN in both methodologies. We argue that a possible reason for
decay weight as introduced in [7] (TD). These three algorithms this inconsistency with respect to the previous reported results is
were based on the implementation found in the RankSys2 library. that here we evaluate using a more common setting in the area
Additionally, we include some purely sequential-based algo- (top-N recommendation) and not AUC, which these algorithms
rithms as a comparison with the results reported in [10] (we use the are optimised for. Furthermore, in [10] no classical recommender
same implementation as the one used in that paper). In the model algorithms like KNN appear in their comparison.
named FMC (Factorised Markov Chains) the item-to-item transition Furthermore, it is interesting to note that a time decay modifica-
matrix is factorised to capture the likelihood that an arbitrary user tion of KNN does not improve under this setting unless many items
transitions from one item to another, using a first-order Markov are considered in the ranking, since TD outperforms KNN only
1 Note these statistics do not match those from [10] or [21] because we do not put any
for Precision@50 and Recall@50. In any case, it seems that basic
constraint on the minimum number of ratings on users and items.
KNN algorithms are competitive against state-of-the-art algorithms
2 http://ranksys.org specifically designed to address the sequential recommendation
Table 1: Summary of comparative effectiveness, including improvement (∆) in terms of nDCG@5 with respect to two of the
baselines. Best values for each evaluation methodology and metric are in bold.

(a) CC methodology

Method Precision@5 nDCG@5 Recall@5 nDCG@10 Precision@50 Recall@50 cvg ∆ wrt KNN ∆ wrt Fossil
ItemPop 1.81E-04 8.89E-04 2.25E-03 1.21E-03 3.80E-04 4.69E-02 100.00% -144.08% -36.88%
KNN 2.29E-04 2.17E-03 2.17E-03 2.94E-03 4.59E-05 4.34E-03 100.00% – 43.92%
TD 2.29E-04 2.17E-03 2.17E-03 2.17E-03 6.88E-05 6.51E-03 100.00% 0.00% 43.92%
FMC 0.00E+00 0.00E+00 0.00E+00 4.49E-04 2.69E-05 1.22E-03 85.21% NA NA
FPMC 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.69E-05 1.22E-03 85.21% NA NA
Fossil 2.69E-04 1.22E-03 2.43E-03 1.22E-03 2.69E-05 2.43E-03 85.21% -78.31% –
BFuCF 2.29E-04 2.17E-03 2.17E-03 2.17E-03 6.88E-05 4.49E-03 100.00% 0.00% 43.92%
BFwCF 4.59E-04 3.10E-03 4.34E-03 3.10E-03 9.17E-05 6.66E-03 100.00% 30.10% 60.80%

(b) Fix methodology

Method Precision@5 nDCG@5 Recall@5 nDCG@10 Precision@50 Recall@50 cvg ∆ wrt KNN ∆ wrt Fossil
ItemPop 3.32E-04 1.28E-03 1.74E-03 2.12E-03 4.06E-04 2.19E-02 100.00% -200.02% -5.48%
KNN 1.05E-03 3.84E-03 5.56E-03 4.94E-03 3.83E-04 2.05E-02 97.20% – 64.84%
TD 3.15E-04 1.09E-03 1.99E-03 1.62E-03 2.97E-04 1.68E-02 97.20% -252.10% -23.79%
FMC 4.34E-04 1.39E-03 2.42E-03 2.27E-03 2.91E-04 1.57E-02 100.00% -176.32% 2.85%
FPMC 4.08E-04 1.08E-03 1.93E-03 1.67E-03 2.20E-04 1.10E-02 100.00% -255.14% -24.86%
Fossil 3.32E-04 1.35E-03 1.64E-03 2.28E-03 2.60E-04 1.38E-02 100.00% -184.43% –
BFuCF 1.05E-03 3.89E-03 5.56E-03 4.75E-03 3.62E-04 1.96E-02 97.20% 1.39% 65.33%
BFwCF 9.46E-04 3.48E-03 4.96E-03 4.65E-03 3.55E-04 1.91E-02 97.20% -10.50% 61.15%

problem, even beating the ItemPop recommender, which is fre- is the same for every user, even though the similarity used (Jaccard)
quently a very strong baseline due to the inherent popularity bias does not take the temporal dimension into account.
found in this type of systems [12].
We now focus on the models generated using the rank aggre-
4 CONCLUSIONS AND FUTURE WORK
gation formulation to address RQ2. We observe that these models
(BFuCF and BFwCF) show very positive results in both evaluation In this paper we have presented a new formulation for neighbourhood-
methodologies. In fact, we experimented with several instantiations based recommendation that allows to integrate the temporal di-
of the framework described in Section 2. We found that the best mension seamlessly and successfully, according to the reported
normalisation method is the identity normaliser, since a rank-based experiments. The two research questions proposed have been an-
approach or the standard normaliser [15] produces worse results swered positively, evidencing that this type of recommendation
(not reported here because of space constraints). Our preliminary algorithms is competitive in temporal scenarios, outperforming
results also showed that the best strategy to select the candidate recent state-of-the-art algorithms specifically tailored to sequential-
items is generating lists as Lm ± ; because of this we report in based recommendation. Furthermore, when formulating the rec-
1,m 2
Table 1 results for this strategy using m 1 = m 2 = 5, Jaccard as simi- ommendation problem as an aggregation of several rankings and
larity, and 100 neighbours, so the comparison is as fair as possible introducing the temporal dimension in the process, the performance
with respect to the rest of the baselines, even though an exhaustive clearly improves, up to a 30% with respect to another neighbour-
tuning of these parameters may achieve better performance values. based recommender without using the temporal component and
When the proposed approaches are used, a high improvement is up to a 65% with respect to a sequential-based baseline.
achieved with respect to both KNN and Fossil baselines; however, Since the framework introduced in this work is general enough
depending on the actual evaluation methodology our approach may to work with other aggregation functions, in the future we plan to
actually degrade its performance (see Fix methodology). In these explore the behaviour of our proposal when alternative aggregation
cases, the coverage is limited by the coverage of the user similarity, functions – such as those based on the score distribution [14] – are
which results in the same coverage as that obtained for KNN and TD used. Furthermore, an exhaustive analysis – with more datasets,
algorithms. It is interesting to observe that the largest improvement baselines such as SVD++ [13] and BPR for implicit data [16], and
is obtained for the more realistic scenario, that is, the CC methodol- evaluation methodologies – should be made to better understand
ogy. Regarding the difference between similarity-weighted (BFwCF) each component of the proposed model, for instance, the number
and unweighted (BFuCF) versions of these methods, the conclu- of items allowed to be selected after and before the last common in-
sions are not clear, since this parameter seems to depend on how teraction and the impact of the similarity when weighting the final
the split was performed. We hypothesise that the similarity values result. An important aspect that deserves further research is the
in the CC methodology are more meaningful because the timeline definition of sequence-aware similarity metrics [3], so that the tem-
poral dimension can be considered when selecting the neighbours
in the proposed approach.
Acknowledgments https://doi.org/10.1145/312624.312682
[12] Dietmar Jannach, Lukas Lerche, Iman Kamehkhosh, and Michael Jugovac. 2015.
This work was funded by the national Spanish Government under What recommenders recommend: an analysis of recommendation biases and
project TIN2016-80630-P. possible countermeasures. User Model. User-Adapt. Interact. 25, 5 (2015), 427–491.
DOI:https://doi.org/10.1007/s11257-015-9165-3
[13] Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted
REFERENCES collaborative filtering model. In KDD ’08. ACM, 426–434. DOI:https://doi.org/10.
[1] Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-Aware Rec- 1145/1401890.1401944
ommender Systems. In Recommender Systems Handbook, Francesco Ricci, Lior [14] R. Manmatha, Toni M. Rath, and Fangfang Feng. 2001. Modeling Score Distribu-
Rokach, and Bracha Shapira (Eds.). Springer, 191–226. DOI:https://doi.org/10. tions for Combining the Outputs of Search Engines. In SIGIR 2001: Proceedings of
1007/978-1-4899-7637-6_6 the 24th Annual International ACM SIGIR Conference on Research and Develop-
[2] Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information ment in Information Retrieval, September 9-13, 2001, New Orleans, Louisiana, USA,
Retrieval - the concepts and technology behind search, Second edition. Pearson W. Bruce Croft, David J. Harper, Donald H. Kraft, and Justin Zobel (Eds.). ACM,
Education Ltd., Harlow, England. 267–275. DOI:https://doi.org/10.1145/383952.384005
[3] Alejandro Bellogín and Pablo Sánchez. 2017. Collaborative Filtering based on [15] M. Elena Renda and Umberto Straccia. 2003. Web Metasearch: Rank vs. Score
Subsequence Matching: A New Approach. Submitted to Information Sciences Based Rank Aggregation Methods. In Proceedings of the 2003 ACM Symposium on
(2017). Applied Computing (SAC), March 9-12, 2003, Melbourne, FL, USA, Gary B. Lamont,
[4] Pedro G. Campos, Fernando Díez, and Iván Cantador. 2014. Time-aware recom- Hisham Haddad, George A. Papadopoulos, and Brajendra Panda (Eds.). ACM,
mender systems: a comprehensive survey and analysis of existing evaluation 841–846. DOI:https://doi.org/10.1145/952532.952698
protocols. User Model. User-Adapt. Interact. 24, 1-2 (2014), 67–119. [16] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
[5] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009,
recommender algorithms on top-n recommendation tasks. In RecSys. ACM, 39– Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence,
46. Montreal, QC, Canada, June 18-21, 2009, Jeff A. Bilmes and Andrew Y. Ng (Eds.).
[6] Christian Desrosiers and George Karypis. 2011. A Comprehensive Survey of AUAI Press, 452–461. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&
Neighborhood-based Recommendation Methods. In Recommender Systems Hand- smnu=2&article_id=1630&proceeding_id=25
book. 107–144. [17] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factor-
[7] Yi Ding and Xue Li. 2005. Time weight collaborative filtering. In CIKM. ACM, izing personalized Markov chains for next-basket recommendation. In WWW.
485–492. ACM, 811–820.
[8] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. 2001. Rank aggrega- [18] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John
tion methods for the Web. In Proceedings of the Tenth International World Wide Riedl. 1994. GroupLens: An Open Architecture for Collaborative Filtering of
Web Conference, WWW 10, Hong Kong, China, May 1-5, 2001, Vincent Y. Shen, Netnews. In CSCW ’94, Proceedings of the Conference on Computer Supported
Nobuo Saito, Michael R. Lyu, and Mary Ellen Zurko (Eds.). ACM, 613–622. DOI: Cooperative Work, Chapel Hill, NC, USA, October 22-26, 1994, John B. Smith,
https://doi.org/10.1145/371920.372165 F. Donelson Smith, and Thomas W. Malone (Eds.). ACM, 175–186. DOI:https:
[9] F. Maxwell Harper and Joseph A. Konstan. 2016. The MovieLens Datasets: History //doi.org/10.1145/192844.192905
and Context. TiiS 5, 4 (2016), 19:1–19:19. DOI:https://doi.org/10.1145/2827872 [19] Alan Said and Alejandro Bellogín. 2014. Comparative recommender system
[10] Ruining He and Julian McAuley. 2016. Fusing Similarity Models with Markov evaluation: benchmarking recommendation frameworks. In RecSys. ACM, 129–
Chains for Sparse Sequential Recommendation. In ICDM. IEEE, 191–200. 136.
[11] Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. 1999. [20] Guy Shani and Asela Gunawardana. 2011. Evaluating Recommendation Systems.
An Algorithmic Framework for Performing Collaborative Filtering. In SIGIR ’99: In Recommender Systems Handbook. 257–297.
Proceedings of the 22nd Annual International ACM SIGIR Conference on Research [21] Tong Zhao, Julian J. McAuley, and Irwin King. 2014. Leveraging Social Con-
and Development in Information Retrieval, August 15-19, 1999, Berkeley, CA, USA, nections to Improve Personalized Ranking for Collaborative Filtering. In CIKM.
Fredric C. Gey, Marti A. Hearst, and Richard M. Tong (Eds.). ACM, 230–237. DOI: ACM, 261–270.