Revisiting Neighbourhood-Based Recommenders
                                For Temporal Scenarios
                            Alejandro Bellogín                                                            Pablo Sánchez
                    Universidad Autónoma de Madrid                                              Universidad Autónoma de Madrid
                             Madrid, Spain                                                                Madrid, Spain
                       alejandro.bellogin@uam.es                                                pablo.sanchezp@estudiante.uam.es
ABSTRACT                                                                            general [2] – to one where each neighbour provides a list of sugges-
Modelling the temporal context efficiently and effectively is essen-                tions for each user, which are later combined into a single ranking.
tial to provide useful recommendations to users. Methods such                       Under this perspective, modelling the temporal aspect of user pref-
as matrix factorisation and Markov chains have been combined                        erences is straightforward – as we shall show here – and provides
recently to model the temporal preferences of users in a sequential                 an intuitive rationale about what is being recommended and why.
basis. In this work, we focus on Neighbourhood-based Collabo-                          In the remaining of this paper, we will answer the following re-
rative Filtering and propose a simple technique that incorporates                   search questions: (RQ1) Are neighbourhood-based recommenders
interaction sequences when producing a personalised ranking. We                     competitive in temporal scenarios, especially when compared against
show the efficiency of this method when compared against other                      methods based on matrix factorisation or Markov chains? (RQ2)
sequence- and time-aware recommendation methods under two                           Is there any advantage in using a rank aggregation formulation
classical temporal evaluation methodologies.                                        for this problem? Furthermore, how can we incorporate temporal
                                                                                    sequences into this formulation?
                                                                                       After presenting our proposed method in detail in the next sec-
1    INTRODUCTION                                                                   tion, we address the research questions experimentally on real in-
Recommmender Systems have become a necessary tool for a large                       teractions from the Epinions website, using two evaluation method-
number of online applications because of their ability to make                      ologies to derive the temporal split. As we shall see, the empirical
personalised recommendations by adapting to user profiles. They                     results validate our approach, showing performance improvements
are widely implemented in many online commercial platforms                          over state of the art memory-based alternatives and recent algo-
like Amazon, Netflix, Youtube, etc. Each of them can use different                  rithms specifically tailored for sequential recommendation.
approaches like collaborative filtering, content-based, and hybrid
approaches. Since the purpose of these systems is to provide the best
possible suggestions, different types of information can be added                   2    INTEGRATING TEMPORAL SEQUENCES IN
to the recommendations (location, type of product, time, ...), also                      NEIGHBOUR-BASED RECOMMENDERS
known as contextual information. Among the different contexts,
temporal information is one of the most interesting contexts to be                  Neighbourhood-based recommenders can be revisited as ranking
integrated into the recommendation algorithms, due to its facility                  fusion algorithms where each neighbour contributes a ranking (of
to be captured and because it usually discriminates better than                     potential relevant items for the target user) and the goal of the
other dimensions [1]. Nonetheless, its formalisation in the area                    recommender system is to combine these rankings into one final
has typically been proposed as heuristic filters, both in terms of                  output. In terms of Aggregated Search [8, 15], each neighbour would
algorithmic approaches or evaluation strategies [4].                                be denoted as a judge (in Information Retrieval these judges are
   One of the earliest and most popular collaborative filtering ap-                 usually different search engines) who gives a complete ordering
proaches is the neighbourhood-based recommender, either user-                       of all the alternative items to be ranked; each of these rankings is
based or item-based (in this article we will focus on the user-based                denoted as τ , and the final fused ranking is τ̂ . Formally, the process
variation). These approaches are normally represented as an aggre-                  of rank aggregation is divided into normalisation (where the scores
gation function of the ratings from the k most similar users over                   or the ranks of τ are normalised into a common scale, w τ (i), for
item i; usually, this aggregation function is represented as [6]:                   each item i) and combination (where the normalised weights w τ (i)
                                                                                    are combined into one fused score).
                                        r w
                              P
                               v ∈N (u ) vi uv                                         There are several methods for each of these stages, see [15] for
                       rˆui = P i                                 (1)               an in-depth review of the most prominent ones. Interestingly, by
                                         |w |
                                v ∈N (u ) uv
                                         i
                                                                                    taking the identity normaliser for the scores (w τ (i) = τ (i)) and
where wuv is the weight (or similarity) between users u and v                       the so-called CombSUM combiner (where the normalised weights
and Ni (u) represents user’s u neighbours that have rated item i.                   are simply added for each item) with a preference weight for each
Different normalisation functions can be applied to this formula,                   ranking equals to the similarity between the neighbour and the tar-
like mean centering or Z-score [11, 18].                                            get user, we obtain a linear combination of the normalised weights,
   In this work, we generalise this classical formulation – borrow-                 which is equivalent to the classical formulation of a neighbourhood-
ing ideas from Aggregated Search and Information Retrieval in                       based recommender. In fact, when we take into account the ratings
                                                                                    of the neighbours, the “score” of user u to item i using CombSum
RecTemp ’17, August 2017, Como, Italy
                                                                                    and the identity normaliser produces the numerator of Equation 1.
Copyright © 2017 for this paper by its authors. Copying permitted for private and   In this situation, each ranking τ is composed of the item-rating pairs
academic purposes.                                                                  rated by a particular neighbour, excluding, as a standard practice in
the community, those items already rated by the target user in train-
ing. Further extensions and ad-hoc modifications could be made to
these normalisers and combiners so that other formulations of this
problem – such as mean-centering or Z-score normalisation [6] –
can be achieved.
    Once we have reformulated the problem of neighbourhood-based
recommendation as a ranking fusion technique, we now describe
how we can incorporate temporal information in the process. The
main idea is that each neighbour will find which is her last
common interaction with the target user and will create a
ranking of her candidate alternatives iterating around that
item, taking into account the order in which she rated each of
those alternatives. Although in this case we are not taking into
account the actual moment of the interaction (i.e., we can end up
                                                                            Figure 1: Example of user interactions in the movie domain.
recommending items from a neighbour whose last common item
                                                                            The yellow border corresponds to the last common interac-
was rated long time ago), we can easily improve this approach by
                                                                            tion between u and each neighbour, the red border repre-
filtering neighbours whose last common item was rated before a
                                                                            sents those movies included in L−2 (v), and the green border
certain threshold date; in this paper we will not explore this option
                                                                            those in L+2 (v).
and leave it as future work.
    Note that the temporal aspect is considered twice in this model: it        In summary, when using this formalisation, we obtain a model
is used to involve the target user (through the last common interac-        equivalent to classical formulations that can further incorporate
tion) in setting the actual moment (context) of the recommendation          the temporal information under different models.
and, at the same time, to exploit the actual (temporal) order in               Finally, let us illustrate the whole process with an example shown
which the neighbour interacted with the items. In the following,            in Figure 1 using the movie domain. For the sake of simplicity, we
we present our model, consisting in a method to compute the last            do not include the user’s rating, so the reader should assume that
common interaction and different strategies to exploit the order of         all sequences are composed of articles that the user has equally
the neighbour ratings.                                                      liked. In the case of movies, the temporal component is usually
    The last common interaction between two users u and v is:               determinant, since newer movies tend to be consumed more often
                  n ∗ (u; v) = max i k ∈ Iut : i k ∈ Ivt                    than older ones. In the presented example, user u is the user to
                                                        
                                                                   (2)
                              k                                             whom we want to make the recommendations, and we represent
where Iut are the items rated by user u ordered by timestamp in             three neighbours v 1 , v 2 , and v 3 , where v 1 and v 3 have 3 items in
ascending order (recent interactions appear later in the list), that is:    common whereas v 2 shares 4 items with the target user. According
                                | Iu |                                    to these interactions, the candidate items generated with respect to
       Iut = sort (Iu , t ) = i kt       , with t i kt < t i kt +1
                                                                
                                                                     (3)    the different strategies presented before (limited to size 2) will be
                                  k =1
Note that the last common interaction n ∗ will not be symmetrical           (considering that n∗ (v 1 ; u) = i 9 , n ∗ (v 2 ; u) = i 10 , n ∗ (v 3 ; u) = i 7 ):
in general – that is, n ∗ (u; v) , n ∗ (v; u) – since it makes reference
to the preferences of the first user.                                              L+2 (v 1 ) = (i 14 , i 13 ), L−2 (v 1 ) = (i 6 , i 2 ), L±1,1 (v 1 ) = (i 14 , i 6 )
    Once we have sorted the preferences by timestamp (Iut and Ivt )
                                                                                   L+2 (v 2 ) = (i 12 , i 13 ), L−2 (v 2 ) = (i 2 ), L±1,1 (v 2 ) = (i 12 , i 2 )
and calculated the last common interactions (n ∗ (u; v) and n∗ (v; u))
for target user u and neighbour v, we propose three strategies                     L+2 (v 3 ) = (i 12 , i 15 ), L−2 (v 3 ) = (i 5 , i 6 ), L±1,1 (v 3 ) = (i 12 , i 5 )
to build the lists with candidate items from each neighbour: (a)
taking the most recent m items rated by the neighbour after the
last common interaction (we denote this list as Lm          + (v) and the      Let us now consider that items i 12 , i 13 and i 14 are in the test set (as
strategy as forward or F), (b) taking the most recent m items rated         mentioned before, newer films are more likely to be chosen by user
before the last common interaction (Lm       − (v), backward or B), and     u). A standard neighbourhood-based recommender which does not
(c) concatenating the m 1 items rated before and the m 2 items rated        take the temporal aspect into account would probably recommend
after the last common interaction (Lm      ±       (v), backward-forward    item i 2 whereas this item only appear in our approach once for
                                             1,m 2
or BF). More specifically, these lists are generated as follows:            strategy L± and twice for L− , mostly in favour of the more recent
                                                                            movie i 12 . Moreover, we believe that moving forward from the last
              Let I t (v; u) = sort (Iv − Iu , t )                          common interaction is more useful in terms of recommendation
                      +
                                 n ∗ +m                                   performance – especially for novelty purposes –; this is evidenced
                    Lm  (v) = i kt ∗ , i kt ∈ I t (v; u)             (4)    by the strategies L+ and L± that recommend i 13 and i 14 .
                                    n
                                 n∗
                    Lm−
                        (v) = i kt ∗ , i kt ∈ I t (v; u)             (5)
                                n −m                                       3 EMPIRICAL EVALUATION
                                  +
                                                    
                 ±                          −
               Lm1,m2 (v) = Lm      1
                                      (v), Lm 2
                                                (v)                  (6)    3.1 Evaluation Methodology
  Therefore, for each neighbour we obtain a list L(v) with all the          We test the proposed approach on a dataset collected from Epin-
candidate items from that neighbour, which will be later normalised         ions.com by the authors of [21], also used in [10]. It includes all
and combined, as explained before, to produce a single ranking,             actions of all users on the website spanning January 2001 to No-
containing the recommendations for the target user.                         vember 2013, for a total of 193, 571 actions on 42, 447 items by
117, 323 users; it hence has a density of 0.004%.1 This dataset fits                     chain. As an extension to this method, we include Factorised Per-
naturally the purpose of exploiting the temporal dimension of the                        sonalized Markov Chains (FPMC), a method that combines Matrix
user preferences, since it represents an unbiased sample of the web-                     Factorisation and first-order Markov Chains [17]. Finally, we also
site; other datasets more common in the area – like MovieLens – are                      include as a baseline the Fossil method introduced in [10] (Fac-
not well-suited because they have been filtered or their timestamps                      torised Sequential Prediction with Item Similarity Models), where
are artificial [9].                                                                      Markov chains are combined with a similarity-based algorithm.
   As described in [4], there are several evaluation conditions worth                       We compared these baselines against different instantiations of
of exploration when evaluating time-aware recommender systems.                           the rank aggregation formulation presented in Section 2. For the
In this work, we use a time-dependent rating order (the timestamps                       sake of space, we only report results when the backward-forward
of the test split for each user occur after those of the training split)                 (BF) strategy to build the lists with candidate items is used, mostly
in two evaluation methodologies: one with a user-centered base                           due to its better performance with respect to the backward and
set and a fixed size condition (the last 2 actions of each user with                     forward strategies. We do experiment with a variation where the
at least 4 actions are included in the test split) and another with a                    similarity is used to weight the contribution of each neighbour
community-centered base set and a proportion-based size condition                        (as in standard user-based CF) and denote it as BFwCF; when no
(the same timestamp is used for all the users, in such a way that                        weight is used in the combination step we denote it as BFuCF.
we retain the data corresponding to the 80% of the most recent                              Unless stated otherwise, we use 100 neighbours in KNN and TD,
rating times for training, and the rest for testing). We name the                        a λ factor of 1/200 in TD, and L = 1, K = 10, λ Θ = 0.1, and α = 0.2
first configuration as Fix and the second as CC. There are obvious                       in FMC, FPMC, and Fossil, as specified in the original paper [10] for
differences between these two evaluation methodologies: whereas                          this dataset; that is, we have not performed an exhaustive search
in CC the test set is always (for every user) after the training set,                    to find the optimal parameters of these algorithms.
in Fix this may not be the case; besides, (almost) every user will
be included in the test set of Fix and this will not be the case in
                                                                                         3.3    Results and Discussion
CC. In other terms, the CC methodology represents better a real
environment, where there are some users that may not be active                           As described previously, we test our approaches under two evalua-
at some point, whereas with the Fix evaluation we can analyse the                        tion conditions that consider differently the temporal dimension
recommendations for all the users, not only the most active (or                          when splitting the dataset into training and test. As shown in Ta-
active in the last period of time) ones.                                                 ble 1, the CC methodology is slightly more difficult than Fix, as
   Using the terminology in [19], we report the results obtained                         evidenced by the lower values obtained in most of the metrics by
following the TrainItems strategy to select the candidate items to                       the baseline algorithms. This observation is in line with previous re-
be ranked by each algorithm; that is, a ranking is generated for                         sults in the area [4]. It should be noted that CC replicates a situation
each user by predicting a score for every item that has a rating in                      closer to what we would find in an online experiment, where the
the training set. We then compute standard Information Retrieval                         test split is set in the future no matter the user we are considering;
metrics on the ranking, considering as relevant every item rated                         this is not true in Fix, where the test split of each user could exist
with a 5 in the test split. We report here the values for precision and                  at different timestamps, but on the other hand, every user contains
recall at 5 and 50, and nDCG at 5 and 10. We also report the user                        the same number of test interactions, which may decrease the bias
space coverage metric (cvg) as defined in [20], that is, the number of                   towards users more active in the most recent portion of the dataset.
users for which the system is able to recommend at least one item.                          We now assess the research questions RQ1 (Are neighbourhood-
Complete code and evaluation scripts can be found in the following                       based recommenders competitive in temporal scenarios, especially
Bitbucket repository: PabloSanchezP/BFRecommendation.                                    when compared against methods based on matrix factorisation or
                                                                                         Markov chains?) and RQ2 (Is there any advantage in using a rank
3.2     Recommendation algorithms                                                        aggregation formulation for this problem?) raised at the beginning
                                                                                         of the paper, in light of the results summarised in Table 1. To ad-
We compare our methods against different well-known state-of-                            dress RQ1 we compare the performance of FMC, FPMC, and Fossil
the-art recommenders. We report a popularity-based recommender                           (combinations of Markov chains and matrix factorisation) against
(ItemPop) that recommends items based on their popularity in the                         the KNN baseline. We observe that, in contrast to the original pa-
system. We also include a classical nearest-neighbour recommender                        per [10], Fossil is not always the best performing method amongst
optimised for ranking (that is, without normalisation, as proposed                       these baselines (this only holds when using the CC methodology).
in [5]) using the Jaccard coefficient as similarity (KNN). A modifica-                   Actually, the results obtained for these methods are worse than
tion of this algorithm is also tested including an exponential time                      KNN in both methodologies. We argue that a possible reason for
decay weight as introduced in [7] (TD). These three algorithms                           this inconsistency with respect to the previous reported results is
were based on the implementation found in the RankSys2 library.                          that here we evaluate using a more common setting in the area
    Additionally, we include some purely sequential-based algo-                          (top-N recommendation) and not AUC, which these algorithms
rithms as a comparison with the results reported in [10] (we use the                     are optimised for. Furthermore, in [10] no classical recommender
same implementation as the one used in that paper). In the model                         algorithms like KNN appear in their comparison.
named FMC (Factorised Markov Chains) the item-to-item transition                            Furthermore, it is interesting to note that a time decay modifica-
matrix is factorised to capture the likelihood that an arbitrary user                    tion of KNN does not improve under this setting unless many items
transitions from one item to another, using a first-order Markov                         are considered in the ranking, since TD outperforms KNN only
1 Note these statistics do not match those from [10] or [21] because we do not put any
                                                                                         for Precision@50 and Recall@50. In any case, it seems that basic
constraint on the minimum number of ratings on users and items.
                                                                                         KNN algorithms are competitive against state-of-the-art algorithms
2 http://ranksys.org                                                                     specifically designed to address the sequential recommendation
Table 1: Summary of comparative effectiveness, including improvement (∆) in terms of nDCG@5 with respect to two of the
baselines. Best values for each evaluation methodology and metric are in bold.

                                                                (a) CC methodology

          Method      Precision@5   nDCG@5      Recall@5    nDCG@10       Precision@50   Recall@50       cvg    ∆ wrt KNN    ∆ wrt Fossil
          ItemPop        1.81E-04   8.89E-04    2.25E-03     1.21E-03        3.80E-04     4.69E-02    100.00%     -144.08%       -36.88%
              KNN        2.29E-04   2.17E-03    2.17E-03     2.94E-03         4.59E-05     4.34E-03   100.00%            –        43.92%
               TD        2.29E-04   2.17E-03    2.17E-03     2.17E-03         6.88E-05     6.51E-03   100.00%        0.00%        43.92%
              FMC        0.00E+00   0.00E+00    0.00E+00     4.49E-04         2.69E-05     1.22E-03    85.21%          NA            NA
             FPMC        0.00E+00   0.00E+00    0.00E+00     0.00E+00         2.69E-05     1.22E-03    85.21%          NA            NA
             Fossil      2.69E-04   1.22E-03    2.43E-03     1.22E-03         2.69E-05     2.43E-03    85.21%      -78.31%             –
           BFuCF         2.29E-04    2.17E-03    2.17E-03     2.17E-03        6.88E-05    4.49E-03    100.00%       0.00%         43.92%
           BFwCF        4.59E-04    3.10E-03    4.34E-03     3.10E-03         9.17E-05    6.66E-03    100.00%      30.10%         60.80%

                                                                (b) Fix methodology

          Method      Precision@5   nDCG@5      Recall@5    nDCG@10       Precision@50   Recall@50       cvg    ∆ wrt KNN    ∆ wrt Fossil
          ItemPop        3.32E-04    1.28E-03    1.74E-03     2.12E-03       4.06E-04     2.19E-02    100.00%     -200.02%        -5.48%
              KNN       1.05E-03     3.84E-03   5.56E-03     4.94E-03         3.83E-04     2.05E-02    97.20%            –        64.84%
               TD        3.15E-04    1.09E-03    1.99E-03     1.62E-03        2.97E-04     1.68E-02    97.20%     -252.10%       -23.79%
              FMC        4.34E-04    1.39E-03    2.42E-03     2.27E-03        2.91E-04     1.57E-02   100.00%     -176.32%         2.85%
             FPMC        4.08E-04    1.08E-03    1.93E-03     1.67E-03        2.20E-04     1.10E-02   100.00%     -255.14%       -24.86%
             Fossil      3.32E-04    1.35E-03    1.64E-03     2.28E-03        2.60E-04     1.38E-02   100.00%     -184.43%             –
           BFuCF        1.05E-03    3.89E-03    5.56E-03      4.75E-03        3.62E-04    1.96E-02     97.20%        1.39%        65.33%
           BFwCF         9.46E-04    3.48E-03    4.96E-03     4.65E-03        3.55E-04    1.91E-02     97.20%      -10.50%        61.15%


problem, even beating the ItemPop recommender, which is fre-                 is the same for every user, even though the similarity used (Jaccard)
quently a very strong baseline due to the inherent popularity bias           does not take the temporal dimension into account.
found in this type of systems [12].
   We now focus on the models generated using the rank aggre-
                                                                             4    CONCLUSIONS AND FUTURE WORK
gation formulation to address RQ2. We observe that these models
(BFuCF and BFwCF) show very positive results in both evaluation              In this paper we have presented a new formulation for neighbourhood-
methodologies. In fact, we experimented with several instantiations          based recommendation that allows to integrate the temporal di-
of the framework described in Section 2. We found that the best              mension seamlessly and successfully, according to the reported
normalisation method is the identity normaliser, since a rank-based          experiments. The two research questions proposed have been an-
approach or the standard normaliser [15] produces worse results              swered positively, evidencing that this type of recommendation
(not reported here because of space constraints). Our preliminary            algorithms is competitive in temporal scenarios, outperforming
results also showed that the best strategy to select the candidate           recent state-of-the-art algorithms specifically tailored to sequential-
items is generating lists as Lm  ±       ; because of this we report in      based recommendation. Furthermore, when formulating the rec-
                                   1,m 2
Table 1 results for this strategy using m 1 = m 2 = 5, Jaccard as simi-      ommendation problem as an aggregation of several rankings and
larity, and 100 neighbours, so the comparison is as fair as possible         introducing the temporal dimension in the process, the performance
with respect to the rest of the baselines, even though an exhaustive         clearly improves, up to a 30% with respect to another neighbour-
tuning of these parameters may achieve better performance values.            based recommender without using the temporal component and
   When the proposed approaches are used, a high improvement is              up to a 65% with respect to a sequential-based baseline.
achieved with respect to both KNN and Fossil baselines; however,                Since the framework introduced in this work is general enough
depending on the actual evaluation methodology our approach may              to work with other aggregation functions, in the future we plan to
actually degrade its performance (see Fix methodology). In these             explore the behaviour of our proposal when alternative aggregation
cases, the coverage is limited by the coverage of the user similarity,       functions – such as those based on the score distribution [14] – are
which results in the same coverage as that obtained for KNN and TD           used. Furthermore, an exhaustive analysis – with more datasets,
algorithms. It is interesting to observe that the largest improvement        baselines such as SVD++ [13] and BPR for implicit data [16], and
is obtained for the more realistic scenario, that is, the CC methodol-       evaluation methodologies – should be made to better understand
ogy. Regarding the difference between similarity-weighted (BFwCF)            each component of the proposed model, for instance, the number
and unweighted (BFuCF) versions of these methods, the conclu-                of items allowed to be selected after and before the last common in-
sions are not clear, since this parameter seems to depend on how             teraction and the impact of the similarity when weighting the final
the split was performed. We hypothesise that the similarity values           result. An important aspect that deserves further research is the
in the CC methodology are more meaningful because the timeline               definition of sequence-aware similarity metrics [3], so that the tem-
                                                                             poral dimension can be considered when selecting the neighbours
                                                                             in the proposed approach.
Acknowledgments                                                                              https://doi.org/10.1145/312624.312682
                                                                                        [12] Dietmar Jannach, Lukas Lerche, Iman Kamehkhosh, and Michael Jugovac. 2015.
This work was funded by the national Spanish Government under                                What recommenders recommend: an analysis of recommendation biases and
project TIN2016-80630-P.                                                                     possible countermeasures. User Model. User-Adapt. Interact. 25, 5 (2015), 427–491.
                                                                                             DOI:https://doi.org/10.1007/s11257-015-9165-3
                                                                                        [13] Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted
REFERENCES                                                                                   collaborative filtering model. In KDD ’08. ACM, 426–434. DOI:https://doi.org/10.
 [1] Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-Aware Rec-                  1145/1401890.1401944
     ommender Systems. In Recommender Systems Handbook, Francesco Ricci, Lior           [14] R. Manmatha, Toni M. Rath, and Fangfang Feng. 2001. Modeling Score Distribu-
     Rokach, and Bracha Shapira (Eds.). Springer, 191–226. DOI:https://doi.org/10.           tions for Combining the Outputs of Search Engines. In SIGIR 2001: Proceedings of
     1007/978-1-4899-7637-6_6                                                                the 24th Annual International ACM SIGIR Conference on Research and Develop-
 [2] Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information           ment in Information Retrieval, September 9-13, 2001, New Orleans, Louisiana, USA,
     Retrieval - the concepts and technology behind search, Second edition. Pearson          W. Bruce Croft, David J. Harper, Donald H. Kraft, and Justin Zobel (Eds.). ACM,
     Education Ltd., Harlow, England.                                                        267–275. DOI:https://doi.org/10.1145/383952.384005
 [3] Alejandro Bellogín and Pablo Sánchez. 2017. Collaborative Filtering based on       [15] M. Elena Renda and Umberto Straccia. 2003. Web Metasearch: Rank vs. Score
     Subsequence Matching: A New Approach. Submitted to Information Sciences                 Based Rank Aggregation Methods. In Proceedings of the 2003 ACM Symposium on
     (2017).                                                                                 Applied Computing (SAC), March 9-12, 2003, Melbourne, FL, USA, Gary B. Lamont,
 [4] Pedro G. Campos, Fernando Díez, and Iván Cantador. 2014. Time-aware recom-              Hisham Haddad, George A. Papadopoulos, and Brajendra Panda (Eds.). ACM,
     mender systems: a comprehensive survey and analysis of existing evaluation              841–846. DOI:https://doi.org/10.1145/952532.952698
     protocols. User Model. User-Adapt. Interact. 24, 1-2 (2014), 67–119.               [16] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
 [5] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of                 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009,
     recommender algorithms on top-n recommendation tasks. In RecSys. ACM, 39–               Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence,
     46.                                                                                     Montreal, QC, Canada, June 18-21, 2009, Jeff A. Bilmes and Andrew Y. Ng (Eds.).
 [6] Christian Desrosiers and George Karypis. 2011. A Comprehensive Survey of                AUAI Press, 452–461. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&
     Neighborhood-based Recommendation Methods. In Recommender Systems Hand-                 smnu=2&article_id=1630&proceeding_id=25
     book. 107–144.                                                                     [17] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factor-
 [7] Yi Ding and Xue Li. 2005. Time weight collaborative filtering. In CIKM. ACM,            izing personalized Markov chains for next-basket recommendation. In WWW.
     485–492.                                                                                ACM, 811–820.
 [8] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. 2001. Rank aggrega-        [18] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John
     tion methods for the Web. In Proceedings of the Tenth International World Wide          Riedl. 1994. GroupLens: An Open Architecture for Collaborative Filtering of
     Web Conference, WWW 10, Hong Kong, China, May 1-5, 2001, Vincent Y. Shen,               Netnews. In CSCW ’94, Proceedings of the Conference on Computer Supported
     Nobuo Saito, Michael R. Lyu, and Mary Ellen Zurko (Eds.). ACM, 613–622. DOI:            Cooperative Work, Chapel Hill, NC, USA, October 22-26, 1994, John B. Smith,
     https://doi.org/10.1145/371920.372165                                                   F. Donelson Smith, and Thomas W. Malone (Eds.). ACM, 175–186. DOI:https:
 [9] F. Maxwell Harper and Joseph A. Konstan. 2016. The MovieLens Datasets: History          //doi.org/10.1145/192844.192905
     and Context. TiiS 5, 4 (2016), 19:1–19:19. DOI:https://doi.org/10.1145/2827872     [19] Alan Said and Alejandro Bellogín. 2014. Comparative recommender system
[10] Ruining He and Julian McAuley. 2016. Fusing Similarity Models with Markov               evaluation: benchmarking recommendation frameworks. In RecSys. ACM, 129–
     Chains for Sparse Sequential Recommendation. In ICDM. IEEE, 191–200.                    136.
[11] Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. 1999.       [20] Guy Shani and Asela Gunawardana. 2011. Evaluating Recommendation Systems.
     An Algorithmic Framework for Performing Collaborative Filtering. In SIGIR ’99:          In Recommender Systems Handbook. 257–297.
     Proceedings of the 22nd Annual International ACM SIGIR Conference on Research      [21] Tong Zhao, Julian J. McAuley, and Irwin King. 2014. Leveraging Social Con-
     and Development in Information Retrieval, August 15-19, 1999, Berkeley, CA, USA,        nections to Improve Personalized Ranking for Collaborative Filtering. In CIKM.
     Fredric C. Gey, Marti A. Hearst, and Richard M. Tong (Eds.). ACM, 230–237. DOI:         ACM, 261–270.