-

Time-aware Collaborative Topic Regression: Towards Higher Relevance in Textual Item Recommendation

Anas Alzogbi

0 0 Department of Computer Science, University of Freiburg 79110 Freiburg , Germany

Time is an important aspect in Recommender Systems. Its impact is observed in several aspects ranging from the change in user interest to the dynamics of adding new users and items into the system. In this work, we present a time-aware recommender system that accounts for the concept-drift in user interest. By computing user-speci c conceptdrift score, our model controls which ratings should have more in uence in the process of learning the recommender model. We consider the usecase of scienti c papers recommendation and conduct experiments on a real-world dataset from citeulike. The results clearly show the superiority of the proposed model over the state-of-the-art methods. They additionally show that conducting time-aware evaluations is essential to achieve realistic evaluation for the recommender system.

Time-aware RS Hybrid Recommendation Latent Dirichlet Allocation (LDA) Matrix Factorization Scienti c paper recommendation

Collaborative ltering (CF) in general and matrix factorization in particular has gained a lot of attention in the last decade as a recommendation technique. Since matrix factorization (MF) showed promising results in generating recommendations [ 10 ], more and more works engaged this method for CF Recommenders. A successful approach that builds on matrix factorization and recently gained considerable interest is Collaborative Topic Regression (CTR) for recommending scienti c articles [ 19 ]. CTR leverages not only collaborative ratings but also articles' textual content in order to learn the latent models for users and items. Several works pushed CTR further in di erent directions. For example, adapting CTR to consider item tags [ 20 ], employing autoencoders for a better latent topic modeling [ 21 ], or considering the word order in the textual content [ 2 ]. Although these works demonstrate appealing results, conducted evaluations ignore an important aspect, the temporal nature of recommender systems. O ine evaluations that don't respect the chronological order of users ratings in the process of train/test data splitting, allow the model to learn from future data, i.e., when the split procedure doesn't guarantee that all training data points are prior in time compared to test data points. We call such evaluations \time-ignorant" and those which obey the temporal order \time-aware" evaluations. Previous works [ 4,14 ] showed that conducting time-ignorant evaluations promise unrealistic performance, whereas time-aware evaluations can better simulate realworld scenarios and provide therefore more realistic results. The di erence in results between time-aware and time-ignorant evaluations can be explained by the \concept-drift" in user interest, i.e., the change of user interest over time. In this paper, we show that the performance of CTR drops signi cantly when evaluated under a time-aware evaluation framework over a real-world dataset. This motivates on the one hand, applying time-aware evaluations to assess the quality of a recommender system and on the other hand, extending CTR to consider temporal aspects, which is our main contribution in this work. Concept-drift in user interest over time is a widely known aspect when building real-world recommender systems. It can be observed in various applications, for example: news, books and scienti c papers recommendations. We distinguish between two models for temporal in uence over the behavior of users: (a) time as context [ 18 ] where user habits repeat regularly at certain intervals. Here, the time value (weekend, evening, summer, etc.) when computing predictions plays an important role in deciding the user interest; and (b) time as an aging factor, where time diminishes old user interactions (ratings). As time elapses, old user's interactions become less representative for the actual user interest. In contrast to the \time-as-context" model, here, the age of the user interaction decides its importance in de ning actual user interest. Concept-drift is related to the latter model and in this work we look at the time from this perspective, time as an aging factor which is the motor for the drift in user interest.

The role of concept-drift in recommender systems has been addressed by a wide range of previous works [ 18 ]. A common strategy is to apply forgetting mechanism, in which old ratings are down weighted so that their contribution in computing the actual user model is penalized. This is achieved by using a timedecay function to compute a weight for each rating. The older the rating is, the lower the corresponding weight gets. However, the steepness of the decay function is regulated by a damping (forgetting) factor and the speci c value of this factor is usually set emperically as in [ 17,14,1,5,11 ].

In this work, we bring the time aspect to CTR. We present Time-aware Collaborative Topic Regression (T-CTR), a recommendation method that applies a forgetting strategy to account for the concept-drift in user interest over time. In contrast to existing works, in T-CTR, we emphasize the fact that users have di erent dynamics when it comes to the interest drift, some users tend to change their interest faster than others. Therefore, we suggest to compute a personalized concept-drift score for each user, a score that quanti es the user tendency to change his/her interest as time goes on. Then, we utilize user concept-drift score as a forgetting factor to compute a weighting value for each observed rating. The main contributions of this paper can be summarized in: { A time-aware hybrid recommender system for textual items (items associated with text content) that dynamically accounts for the concept-drift in user interest by leveraging the textual content of rated items. { An experimental study on a real-world dataset that explores the di erences between time-aware and time-ignorant evaluation methods when evaluating recommender systems.

{ A real-world dataset that enables conducting time-aware o ine evaluations. The remainder of this paper is organized as follows: in Section 2, we review the most relevant existing works; Afterwards, in Section 3 we introduce our notation and the important preliminaries; in Section 4, we present our method; then, we explain the conducted experiments and analyze the ndings in Section 5; nally, we conclude in section 6. 2

Related Work

Several works addressed the role of time and concept drift in recommender systems [ 18 ]. Koren introduced in [ 9 ] a matrix factorization method that learns time-based biases along learning users and items latent factors in a method called timeSVD++. The time period of the ratings is divided into bins (time intervals) and a bias is learned for each bin. This method can compute predictions for time intervals only if they appear in the training phase, it is therefore not applicable in time-aware evaluation setup. A recent approach ts a time series model that learns from historical ratings how users latent models evolve over time as in [ 12,13,6 ]. This approach involves re tting the latent models at each time interval and afterwards tting the auto-regressive model that nds the linear correlation between actual user latent model and the previous ones. This process adds an extra complexity on the recommendation algorithm. Additionally, as we will show in Subsection 5.4, these methods don't produce good results when the underlying data has few ratings within small intervals. Another strategy for considering temporal in uence which is similar to ours, is to apply a forgetting mechanism in which old ratings are either discarded or down weighted based on a forgetting factor [ 5,1,17,11 ]. In these works, the forgetting factor is set empirically, whereas in our work, we compute an individual value for each user dynamically (cf. Subsection 4.1). Time aspect in matrix factorization was also addressed in [ 14 ], the work suggested a stream-based algorithm for updating users and items preference models in an online fashion. As new ratings arrive, old ratings are considered obsolete and this triggers either re tting the learned models or penalizing old models. The authors suggested also several strategies for dealing with old ratings. A key di erence in our work is that we leverage the items textual content for estimating user-speci c concept-drift scores. 3

Problem Statement and Preliminaries

Before explaining the details of our method, we introduce some notation and give a brief explanation about important background information relevant to our method: matrix factorization and collaborative topic regression.

Notation and Problem Statement

Let U = fu1; : : : ; ung be the set of users and I = fi1; : : : ; img the set of items. We assume each item has textual content and is associated with a bag of words representation over the set of domain-related vocabulary. Additionally, each user has a set of relevant items, recorded in the rating matrix R 2 Rn m. An entry Rui has a value of 1 if the user u is interested in item i, otherwise Rui = 0. We assume the one-class scenario where only relevant items are known. Therefore, zero values in R don't necessarily represent negative ratings but also unknown ratings. Each rating Rui is associated with a time stamp tRui that records the time when user u rated item i. Given U , I, and R, the goal is to predict for every user u 2 U at a given time T , the set of top M relevant items from I. 3.2

Matrix Factorization for Collaborative Filtering

Matrix factorization (MF) is one of the most successful recommendation methods for model-based collaborative ltering [ 10,15 ]. The main idea is to factorize the incomplete rating matrix R into two matrices with a joint latent low-dimensional space of dimension k: the users latent matrix U 2 Rn k and the items latent matrix V 2 Rm k, where each user u and each item i are represented as latent vectors Uu 2 Rm, Vi 2 Rn respectively.

Zero values in the rating matrix don't necessarily denote negative ratings, but also unknown or missing ratings. Therefore, they should have less contribution in the learning process in comparison to known ratings. This is the well-known one-class problem, where only positive ratings are available. To solve this problem, con dence weights are introduced [ 16,7 ] where zero values are weighted by a small value b and non-zero values are weighted by a larger value a such that a > b > 0. Given R, a matrix factorization algorithm nds U and V that minimize the following objective function with con dence weights, the regularized squared reconstruction error: argmin

U;V

X u2U;i2I

Cui(Rui

UuT Vi)2 + u X jjUujj2 + v u2U

X i2I jjVijj 2 Where u, v are the regularization parameters and Cui is the con dence weight of the rating Rui:

Cui = (a; if Rui 6= 0

b; otherwise R^ui = UuT Vi (1) (2) (3) After nding U and V , we can estimate the a nity of user u towards item i by the dot product between their latent factors: 3.3

Collaborative Topic Regression (CTR)

CTR [ 19 ] is a hybrid recommendation approach that builds on MF and extends it to bene t from items' textual content. It adopts matrix factorization for oneclass problem. Additionally, it assumes that items' latent vectors are generated from a topic model, speci cally, Latent Dirichlet Allocation (LDA) [ 3 ]. LDA is a topic modeling algorithm that nds a set of topics for a set of documents. Let 2 Rm k be the matrix of k latent topics extracted from a set of m items by LDA, where i 2 Rk is the topic vector of item i. CTR consists in factorizing the rating matrix R into users latent matrix U and items latent matrix V , such that V is basically the latent topics vectors extracted by LDA with an added o set: Vi = i + i. The o set i represents how much of the prediction relies on i's content and how much it relies on other users ratings on item i. To solve the matrix factorization, CTR applies a probabilistic model as in [ 15 ] that aims at maximizing the log likelihood of the model variables1 U and V : L =

X The algorithm to maximize L alternates the parameter optimization between U and V until convergence. Each time, one parameter is xed to its current estimate and the other parameter is optimized by di erentiation, which leads the following analytic solutions:

(V T CuV + uI ) 1V T CuRu Vi (U T CiU + vI ) 1(U T CiRi + v i) (5) (6) Where I is k k identity matrix, Cu 2 Rm m and Ci 2 Rn n are diagonal matrices, with Cu1; ; Cum and C1i; ; Cni at their diagonals respectively and Ru 2 Rm is the vector that contains u's preferences. Similarly, Ri 2 Rn is the vector that contains preferences for item i.

After nding U and V , we approximate the missing ratings using Equation 3. 4

Time-aware Collaborative Topic Regression (T-CTR)

For considering the temporal aspect, we propose T-CTR. Our approach is a hybrid recommender system that is capable of accounting for concept-drift in user interest. It learns users and items latent models seamlessly from items' textual content and users ratings. Additionally, we impose the time in uence in the model by extending the role of con dence weights. As we have seen in the previous section, con dence weights are employed to give known ratings more importance than unknown ratings. In T-CTR, we give con dence weights an additional task, which is expressing di erent importance levels for di erent known ratings. As mentioned earlier, the older a rating gets, the less important it becomes in representing the actual user interest. Therefore, we make old ratings weigh less than recent ones. Above that, the aging process of ratings is userspeci c i.e di erent users bare di erent concept-drift mechanisms. For example, 1 In CTR, the matrix of items latent topics is considered as a variable as well, but we removed it from the list of variables for simplicity because experiments in [ 19 ] showed that xing as the result of LDA gives comparable performance. given two users A and B where A tends to change the topics of interest more rapidly than B. Assume that both users have 6-months-old ratings: RA;i, RB;j respectively. Although these ratings have the same age, knowing that A tends to change the topics of interest more rapidly than B gives RB;j more importance in representing the actual interest model of B than RA;i in representing the actual interest model of A. In order to account for this di erence in users' behavior, we calculate a per-user concept-drift score. This score quanti es the user tendency to change his/her interests as time goes on. User's concept-drift score is then involved in computing the ratings con dence weights, which allows getting different con dence weights for ratings from di erent users even when they have the same age. In the following, we explain in details the users concept-drift scores, ratings weights and the model learning algorithm. 4.1

Concept-drift Score

The inter-similarity between items that successively appear in the user's list of relevant items gives us an important evidence whether the user has a consistent taste or tends to show a drift in the interest. The lower this similarity is, the higher the likelyhood that the user experiences a concept-drift in her interest. In order to calculate similarities between items, we choose to represent items using their latent topics. Therefore, given the items textual content, we extract the set of k latent topics for each item using LDA. Let 2 Rm k be the items-topics matrix computed by LDA. il is the probability of item i having topic l. For each item, we keep only the representative topics, those topics which probability is higher than a certain threshold . i is the set of such topics. In our experiments, we chose = 0:01 empirically: i = fl j il g; i 2 I Given the rating matrix R and , we calculate the concept drift scores as shown in Algorithm 1. For each user u, we rst order u's ratings by the rating date. Then, based on item's topics, we calculate the pairwise similarity between each two items i and j that appear successively in u's ratings. In our implementation, we used the Jaccard similarity to calculate the similarity between two sets of topics. The concept-drift score is then calculated as (1 - average pairwise similarity). The result is the set of concept-drift scores for all users S, which will be used in computing con dence weights. 4.2

Ratings Con dence Weights

The con dence weight of a rating quanti es the rating's importance in representing user interest at a given time T and serves to control how much a rating Rui should contribute in the process of learning the latent models of user u and item i. Having the concept-drift scores S for all users, we apply an exponential decay function (Equation 7) to compute the con dence weights Wui for all u's ratings based on the rating's age: T tRui . Here, the concept-drift score Su controls the steepness of the decay function and it therefore plays the role of Algorithm 1: ConceptDrift

Input: Items' LDA topics , Rating matrix R Result: List of users' concept-drift scores S Initialize S to an empty list; for u 2 U do

P := fijRu;i = 1g; Sort P by the ratings dates; initialize Su to 0; for i = 1 to jP j 1 do

Su := Su + Jaccard-similarity( Pi ; Pi+1 ); end Su := Su=(jPuj 1);

Append 1 Su to S; end an aging factor, the higher the score is, the steeper the function gets. Figure 1 demonstrates the in uence of di erent values for the concept-drift on the con dence weights. This is a desired behavior to account for the di erence in users concept-drift mechanisms. This way, users with higher concept-drift scores, will have steeper curve and as a result, their old ratings get lower weights. The rating's age granularity can be con gured based on the underlying application. For example, in the scenario of paper recommendation, we chose the age to be in months.

Wui =

2 1 + eSu(T tRui ) (7) t h g i e w e c n e d n o c After computing the concept-drift scores and ratings con dence weights, we can learn the latent topic vectors U and V from R and similarly to CTR as explained in Subsection 3.3. But, the con dence scores are not taken from Equation 2, we use our calculated con dence weights instead. Thus, the con dence matrix C is de ned as following:

Cui = (max(Wui; b) if Rui = 1 b; otherwise Here, b is the con dence score for the unknown ratings, fRui j Rui = 0g and is set to a small value as in [ 16,19 ].

After nding U and V , we approximate the predicted ratings using Equation 3. 5

Experiments and Discussion

We conducted o ine evaluations on a real-world dataset to demonstrate the e ectiveness of our model and compare it against other state-of-the-art and basic approaches2. In this section, we introduce the used dataset and the experimental setup. Then, we explain the conducted experiments and discuss the ndings. 5.1

Dataset

We used a dataset from citeulike3. Citeulike allows users to create personalized digital libraries where they can bookmark and tag relevant scienti c publications (papers, books, theses,...). Our dataset spans over three years starting from November 2004 to December 2007 and contains information about 210,137 papers and 3,039 users with a total of 284,960 ratings. All users have at least 10 papers in their libraries. Ratings are also associated with timestamps which record the time of adding the paper to the user library. We collected publications meta-data such as title, abstract, publication year and keywords. We de ned the dataset vocabulary as a set of 19871 words. It comprises all keywords associated with the papers, in addition to 10000 words extracted from the articles' titles and abstracts. We kept only English words with more than 2 letters and applied stop words removal, stemming and nally removed very in-frequent and very frequent words (those appearing in less than 3 documents or more than 90% of all papers). 5.2

Experimental Setup

In order to apply time-aware evaluations that simulate a real-world scenario, we followed recommendations of Campos et al. in [ 4 ]. We chose 5 di erent dates to be the split points. Each two successive dates are 6 months apart. We simulated a real-life scenario where the recommender rebuilds its model at each split date to generate predictions for the next 6 months. This results in 5 folds, one fold for each split date. All ratings before the split date are considered as the fold's training set, ratings from the next 6 months comprise the fold's test set. The test sets contain ratings from users that appear in the training set. This is because our method doesn't address the problem of having new users (the cold-start 2 Our implementation is available at: https://github.com/anasalzogbi/T-CTR 3 http://www.citeulike.org problem). Note that test sets may contain papers unseen in the corresponding training sets. For each fold, we t the model on the training set and test it on the test set. Table 1 shows the number of users, papers and ratings in each fold for both training and test sets4. The recommender generates for each (user, paper) pair a scalar prediction score that represents the paper's relevance to the user. For each user, the papers are ranked based on the prediction score and top M papers are recommended. We evaluate the presented approach based on the following ranking metrics which are typical for evaluating recommender systems: Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (nDCG@M) [ 8 ] and Recall@M [ 19 ]. The average of these metrics over all users for each fold is reported. 5.3

Time-aware vs time-ignorant evaluations

In our initial experiment, we evaluate a state-of-the-art system on our dataset following the time-aware scheme. The goal is to study the applicability and the expected performance of such methods in real-world scenarios and show its deviation from the -usually reported- time-ignorant results. As a representative model, we chose CTR [ 19 ] (cf. Subsection 3.3). Figure 2 shows the performance of CTR evaluated on time-ignorant and time-aware schemes. The results of all metrics show clearly that the method performance drops signi cantly when a time-aware evaluation is imposed. We believe the reason for this behavior is related to the concept-drift in users interests, this can be explained as following. In time-ignorant evaluations, training and test ratings are sampled randomly from the set of all available ratings. This allows the model to possibly sample training ratings from di erent time-slots and learn accordingly. On the contrary, restricting the training ratings to be sampled exclusively from time slots that are prior in time to test ratings, makes it more challenging for the tted model to predict future ratings correctly. 5.4

T-CTR against baselines

To analyze the performance of our approach (T-CTR), we compare it with the following methods: 4 The dataset is available for public use at:

http://dbis.informatik.uni-freiburg.de/forschung/projekte/SciPRec 0:6 0:4 0:2

0 time-aware time-ignorant time-aware

time-ignorant e u l a v c i r t e M 0:4 0:3 0:2 0:1 5 40 80 120 160 200

MRR

{ CF: Collaborative Filtering for Implicit Feedback [ 7 ] is an e ective matrix factorization method for positive-only (one-class) datasets. It factorizes the rating matrix and uses static con dence weights for known and unknown ratings (cf. Subsection 3.2). { CTR: Collaborative Topic Regression [ 19 ] performs topic modeling and collaborative ltering simultaneously (cf. Subsection 3.3). { CE: Collaborative Evolution For User Pro ling [ 13 ]. This work represents the state-of-the-art time-aware MF-based recommender systems. It follows a di erent strategy than ours, in which the evolution of user latent models over time is learned by tting an auto-regressive model. Their assumption is, user latent model Uut at time t is dependent on the user's previous latent models fUut j j 1 j g. We chose this method as a representative for such methods [ 6,12 ] that learn the evolution of users models instead of applying a forgetting strategy.

0 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Fold

Fold

Fold Fig. 3: Performance comparison for T-CTR and the baseline methods. Evaluation metrics are shown for each fold.

CF 0:15

0 scenario shows an additional shortcoming that contributes to its poor performance. Here, we will give more insights about how CE works in order to explain its poor performance on our dataset. In CE, an auto-regressive model is learned from the rst T0 time intervals. It nds coe cient matrices each is a k k matrix. Implementing this method requires several decisions to be made that are subject to the underlying dataset: rst, the time interval (day, week, month, etc.); second, T0, the number of time intervals used in tting the auto-regressive model; and third, , the auto-regressive model's dimension (number of historical time intervals). T0 cannot exceed the number of available intervals in the underlying dataset (see Table 2). According to the description of CE in [ 13 ], in order to estimate the coe cient matrices correctly, the following condition should be met T0 k . As T0 is limited, and k can not grow together. For example, let's consider the 4th fold, choosing one week as the time interval gives 126 intervals in the training set. We can assign 100 for learning the auto-regressive model: T0 = 100. If we want to build the auto-regressive model with looking at 4 weeks in the past ( = 4), then k can be at most 25. We know that this value of k is too small to learn good latent models in our dataset, it is desired to give higher values for and k and this is not possible as long as the previously mentioned condition should be met. Although CE builds a time-aware model, the limitation we explained here makes it inapplicable in such real-world datasets where the ratings frequency doesn't allow considering shorter time intervals. CTR shows better performance in comparison with CF and CE as it utilizes the content of the items in addition to the collaborative ratings. However, accounting for the concept-drift in user interest leads to the superiority of our method against all studied methods in all recorded metrics as shown in Figure 3. 5.5

User-speci c vs Common Concept-drift Scores

In our last experiment, we analyze the advantage of computing the concept-drift score for each user individually. Therefore, we ran T-CTR with the following congurations: (a) T-CTR: where the concept-drift score is computed individually for each user as in Algorithm 1; (b)T-CTR-s: where a common concept-drift score (s) is set for all users. We chose three values for s ranging from small to high: s 2 f0:1; 0:5; 1g. As depicted in Figure 1, lower concept-drift scores give higher con dence weights for old ratings. When s = 0:1 for example, old ratings are lightly penalized and when s = 1 old ratings are strongly penalized. The results are shown in Figure 4. The results of all evaluation metrics show that using individual concept-drift scores yields better results. To gain better T-CTR

CTR-0.1

CTR-0.5

CTR-1 0:25 R R M 0:2 0:15

0:16 0:1 0:08 0:16 0:14 0:08 1 2 3 4 5 understanding about the role of concept-drift score, we conducted qualitative analysis. We considered the 5th fold and compared the performance of T-CTR-s for individual users across the di erent values of s. We found the following, when increasing s from 0.1 to 0.5, results improved for 87 users but worsened for 143, this means for 87 users s = 0:5 is a better choice than s = 0:1 and for 143 users it is the opposite case. Similar observation can be realized when moving to s = 1, compared to s = 0:5, the results got better for 109 users and worst for 134 users. An interesting question is whether those users which got better results when increasing s to 0.5 will also show better results for s = 1. We found that not all users who showed results improvement for s = 0:5 also showed improvement for s = 1, 46 of them got worst results and additional 68 users showed better results. This analysis supports our assumption that each user has an individual concept-drift score which does not t necessarily other users. Above that, our suggested method to dynamically compute individual concept-drift scores leads to better results than assigning a common score for all users. 6

Conclusion and Future Work

In this paper, we introduced T-CTR, a time-aware approach for recommending textual items. Based on the heterogeneity of the items from user's historical ratings, we compute a personalized user-speci c concept-drift score. Then, we use these scores to calculate con dence weights for known ratings. These weights control the ratings' contribution in tting the CTR model. The take-away messages from this work is twofold: (a) in order to achieve realistic evaluation, it is essential to conduct time-aware evaluation method; and (b) as users have di erent concept-drift dynamics, concept-drift models should be computed for each user individually. The main aspect that we plan to investigate in our future work is to design a probabilistic model that allows learning the concept-drift score for each user instead of relying on the heuristic approach of calculating the average similarity of the user previous ratings.

1. Alzoghbi , A. , Ayala , V.A.A. , Fischer , P.M. , Lausen , G.: Pubrec: Recommending publications based on publicly available meta-data . In: LWA Workshops: KDML , FGWM , IR, and FGDB, pp. 11 { 18 ( 2015 )

2. Bansal , T. , Belanger , D. , McCallum , A. : Ask the gru: Multi-task learning for deep text recommendations . In: Proceedings of the 10th ACM Conference on Recommender Systems , RecSys ' 16 . ACM ( 2016 )

3. Blei , D.M. , Ng , A.Y. , Jordan , M.I. : Latent dirichlet allocation . Journal of machine Learning research 3(Jan) , 993 { 1022 ( 2003 )

4. Campos , P.G., D ez , F., Cantador , I. : Time-aware recommender systems: A comprehensive survey and analysis of existing evaluation protocols . User Modeling and User-Adapted Interaction 24 ( 1-2 ), 67 { 119 ( 2014 )

5. Ding , Y. , Li , X. : Time weight collaborative ltering . In: Proceedings of the 14th ACM international conference on Information and knowledge management , pp. 485 { 492 . ACM ( 2005 )

6. Gao , L. , Wu , J. , Zhou , C. , Hu , Y. : Collaborative dynamic sparse topic regression with user pro le evolution for item recommendation . In: AAAI Conference on Arti cial Intelligence , pp. 1316 { 1322 ( 2017 )

7. Hu , Y. , Koren , Y. , Volinsky , C. : Collaborative ltering for implicit feedback datasets . In: Data Mining , 2008 . ICDM'08. Eighth IEEE International Conference on, pp. 263 { 272 . Ieee ( 2008 )

8. Jarvelin, K. , Kekalainen, J.: Cumulated gain-based evaluation of ir techniques . ACM Trans. Inf. Syst . 20 ( 4 ) ( 2002 )

9. Koren , Y. : Collaborative ltering with temporal dynamics . Communications of the ACM 53 ( 4 ), 89 { 97 ( 2010 )

10. Koren , Y. , Bell , R. , Volinsky , C. : Matrix factorization techniques for recommender systems . Computer 42 ( 8 ), 30 { 37 ( 2009 )

11. Liu , N.N. , Zhao , M. , Xiang , E. , Yang , Q. : Online evolutionary collaborative ltering . In: Proceedings of the Fourth ACM Conference on Recommender Systems , RecSys ' 10 . ACM ( 2010 )

12. Liu , X. : Modeling users' dynamic preference for personalized recommendation . In: Proceedings of the 24th International Conference on Arti cial Intelligence , pp. 1785 { 1791 ( 2015 )

13. Lu , Z. , Pan , S.J. , Li , Y. , Jiang , J. , Yang , Q. : Collaborative evolution for user proling in recommender systems . In: Proceedings of the Twenty-Fifth International Joint Conference on Arti cial Intelligence , pp. 3804 { 3810 ( 2016 )

14. Matuszyk , P. , Vinagre , J. , Spiliopoulou , M. , Jorge , A.M. , Gama , J.: Forgetting techniques for stream-based matrix factorization in recommender systems . Knowledge and Information Systems 55 ( 2 ), 275 { 304 ( 2018 )

15. Mnih , A. , Salakhutdinov , R.R. : Probabilistic matrix factorization . In: Advances in neural information processing systems , pp. 1257 { 1264 ( 2008 )

16. Pan , R. , Zhou , Y. , Cao , B. , Liu , N.N. , Lukose , R. , Scholz , M. , Yang , Q. : One-class collaborative ltering . In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM '08 , pp. 502 { 511 . IEEE Computer Society, Washington, DC, USA ( 2008 )

17. Vinagre , J. , Jorge , A.M.: Forgetting mechanisms for scalable collaborative ltering . Journal of the Brazilian Computer Society 18 ( 4 ), 271 { 282 ( 2012 )

18. Vinagre , J.a. , Jorge , A.M. , Gama , J.a. : An overview on the exploitation of time in collaborative ltering . Wiley Int. Rev. Data Min. and Knowl. Disc . 5 ( 5 ) ( 2015 )

19. Wang , C. , Blei , D.M.: Collaborative topic modeling for recommending scienti c articles . In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining , pp. 448 { 456 . ACM ( 2011 )

20. Wang , H. , Chen , B. , Li , W.J.: Collaborative topic regression with social regularization for tag recommendation . In: Proceedings of the Twenty-Third International Joint Conference on Arti cial Intelligence , pp. 2719 { 2725 ( 2013 )

21. Wang , H. , Wang , N. , Yeung , D.Y.: Collaborative deep learning for recommender systems . In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ' 15 . ACM ( 2015 )