=Paper=
{{Paper
|id=Vol-1618/IFUP_paper_4
|storemode=property
|title= TOCCF: Time-Aware One-Class Collaborative Filtering
|pdfUrl=https://ceur-ws.org/Vol-1618/IFUP_paper_4.pdf
|volume=Vol-1618
|authors= Xinchao Chen,Weike Pan,Zhong Ming
|dblpUrl=https://dblp.org/rec/conf/um/ChenPM16
}}
== TOCCF: Time-Aware One-Class Collaborative Filtering==
TOCCF: Time-Aware One-Class Collaborative Filtering Xinchao Chen, Weike Pan* and Zhong Ming* College of Computer Science and Software Engineering, Shenzhen University xcchen1993@gmail.com, {panweike,mingz}@szu.edu.cn *: corresponding author. ABSTRACT reason that modeling one-class feedback is considered more One-class collaborative filtering (OCCF), or recommenda- important is simply due to the fact that users are somehow tion with one-class feedback such as shopping records, has reluctant to assign a multi-class score to a product after recently gained more attention from researchers and practi- purchasing. tioners in the community. The main reason is that one-class In order to model the one-class feedback, two main lines of feedback in the form of (user, item) pairs are often more techniques are usually adopted, which are parallel to that of abundant than numerical ratings in the form of (user, item, collaborative filtering, including memory-based OCCF and rating) triples as exploited by traditional collaborative fil- model-based OCCF. For memory-based OCCF, the only d- tering algorithms. However, most of the previous work on ifference from that of memory-based CF is that the simi- OCCF do not consider the temporal context, which is known larity between two users or two items are estimated based of great importance to users’ preferences and behaviors. In on the one-class feedback instead of the ratings. For model- this paper, we first formally define a new problem called based OCCF, the techniques are often different from that of time-aware OCCF (TOCCF), and then design a novel time- model-based CF, in particular of the underlying assumption aware similarity learning (TSL) model accordingly. Our T- for the learning task of positive feedback only and the predic- SL is based on a novel time-aware weighting scheme and a tion rule based on similarity learning. The most well-known seminal work on similarity learning, and is able to learn the preference assumption for one-class feedback is probably the item similarities more accurately. Empirical studies on t- pairwise preference assumption called Bayesian personalized wo large real-world datasets show that our TSL model can ranking defined on the difference between a purchased prod- integrate the temporal information effectively, and perform uct and an un-purchased one [7]. And the most recent work significantly better than several state-of-the-art recommen- on similarity learning approach is the factored item similar- dation algorithms. ity model (FISM) [2], which learns the latent representation of items with the assumption that the inner product of two items’ latent factors is their similarity. CCS Concepts The aforementioned advances in modeling one-class feed- •Information systems → Personalization; •Human- back in OCCF have indeed achieved great success in various centered computing → Collaborative filtering; recommendation applications. However, we find that very few work have explicitly studied the temporal effect in OC- Keywords CF, though it has shown to be very helpful in user behavior modeling in CF [3, 4]. A recent work on microblog recom- Time-Aware; One-Class Collaborative Filtering; Factored mendation [8] shows that temporal information is helpful. Item Similarity Model However, the time-aware weighting scheme [8] is designed for the specific application, where the items (or tweets) are 1. INTRODUCTION recorded with time when they arrive at the users instead of One-class collaborative filtering (OCCF) [2, 5] is a recent when they are retweeted by the users. In the studied general research focus in the community of recommender system- time-aware OCCF, we only have the temporal information s. In OCCF, the data we can exploit for recommendation when users have actions to items. are the so-called one-class feedback such as “transactions” In this paper, we first design a time-aware weighting scheme in e-commerce instead of multi-class feedback or numerical for the reliability of the positive feedback, and then propose ratings in traditional collaborative filtering problems. The a time-aware similarity learning (TSL) model by integrating the weight as a confidence score into the similarity learning model [2]. The time complexity of TSL is the same with that of FISM [2]. We conduct extensive empirical studies on two public large datasets with the state-of-the-art base- lines of memory-based methods and model-based methods. The empirical results show that our new similarity learning model is simple but very effective in exploiting the time con- text, and is significantly better than the algorithms without . modeling the temporal effect. Figure 1: An illustration of time-aware one-class collaborative filtering (TOCCF) and OCCF. 2. TIME-AWARE SIMILARITY LEARNING and Wi′ · can be learned via some pointwise or pairwise loss functions in an optimization problem. 2.1 Problem Definition With the learned similarity si′ i , the preference of user u In time-aware one-class collaborative filtering (TOCCF), on item i can then be predicted as follows [2], we have n users, m items and their positive feedback in the X r̂ui = si′ i + bu + pi , (2) form of (user, item, time) triples, e.g., (u, i, tui ), denoting i′ ∈Nu \{i} user u has a positive feedback on item i at time tui . In TOCCF, our goal is to learn users’ preferences from the where bu and pi are preference bias of user u and popularity positive feedback and associated temporal information, and bias of item i, respectively. provide a personalized ranked list of items for each user u that he or she may like in the future. 2.3 Time-Aware Similarity Learning Notice that in OCCF [5], the temporal information is not We introduce a confidence measurement for an observed exploited, i.e., the data is of (user, item) pairs. We illustrate positive feedback (u, i, tui ), the studied problem in Figure 1, where OCCF is a special 1 case of TOCCF and is represented as a mixed (user, item) cui = , (3) (tτ + 1) − tui feedback matrix ignoring the time context. where tτ is the largest time stamp (in day) in the training data, and thus (tτ + 1) is used to denote the current day. Table 1: Some notations. Notice that we use the inverse of the difference between the T = {u, i, tui } positive feedback current time (tτ + 1) and the time the positive feedback is Tu positive feedback of user u issued tui because a more recent feedback is more reliable, T′ sampled unobserved feedback and is thus of high confidence. d∈R latent feature number Our proposed confidence measurement shown in Eq.(3) Vi· , Wi′ · ∈ R1×d latent feature vectors looks similar to but is very different from the pairwise con- bu ∈ R user bias fidence weight in BPRC [8], which is defined on two (user, pi ∈ R item bias item) pairs, i.e., cuij for (u, i) and (u, j), instead of on one rui ∈ {1, 0} preference of (u, i) single (user, item) pair in Eq.(3). Notice that in BPRC [8], (u, i) denotes a retweet feedback and (u, j) denotes a non- r̂ui prediction of (u, i) retweet feedback, while both are associated with temporal α tradeoff parameter information of when the tweet is received by user u. In our T iteration number TOCCF, we only have the temporal information of the ob- served positive feedback, and thus the approach in [8] is also not applicable to our studied problem. Finally, we have a general time-aware weighting scheme, 2.2 Similarity Learning cui , if (u, i, tui ) ∈ Tu , It is well known that similarity measurement is critical ωui = (4) 1, otherwise, in collaborative filtering, because it determines the neigh- borhood of a certain user u and thus affects the preference which means that we will weight the known (i.e., observed prediction of user u on other items. The state-of-the-art feedback) only. It is thus a ponitwise confidence weight. approach [2] does not adopt traditional similarity measure- With the time-aware weighting scheme and the loss func- ment such as Cosine similarity or Jaccard index, but turns tion of FISM [2], we propose to solve the following optimiza- to learn the similarity from the preference data, which is tion problem, empirically more adaptive to different datasets. Mathemat- X 1 ically, the learned similarity in FISM [2] is represented as min ωui (rui − r̂ui )2 + R(V, W, b, p), (5) V,W,b,p 2 follows, (u,i,tui )∈T ∪T ′ 1 where T ′ is a set of randomly sampled unobserved feed- si′ i = p Vi· WiT′ · , (1) back with |T ′ | = 3|T |, R(V, W, b, p) = α2 m P ||V ||2 + |Ni \{i}| Pm 2 Pn 2 Pm 2 i=1 i· α α α i′ =1 ||Wi · || + 2 u=1 bu + 2 i=1 pi is the regular- ′ 2 where Ni = {i′ |(u, i′ , tui′ ) ∈ Tu }, and Vi· , Wi′ · ∈ R1×d are ization term commonly used to avoid overfitting. The opti- latent feature vectors to be learned for item i and item i′ , mization problem can be solved in a commonly used gradient respectively. The similarity si′ i or the latent vectors Vi· descent algorithm [2]. Table 2: Recommendation performance of TSL, FISM, BPR, ICF with Cosine similarity, i.e., ICF(CS), and Jaccard index, i.e., ICF(JI), on ML10M and Netflix using Precision@5, Recall@5, F1@5, NDCG@5 and 1- call@5. The best results are marked in bold. We also include the searched value of the tradeoff parameter and iteration number in model-based methods for reproductivity. Parameter Evaluation metric Dataset Method α T Precision@5 Recall@5 F1@5 NDCG@5 1-call@5 ICF(JI) 0.0163 0.0052 0.0064 0.0188 0.0720 ICF(CS) 0.0158 0.0050 0.0062 0.0182 0.0697 ML10M BPR 0.01 100 0.1186 0.0437 0.0525 0.1305 0.4021 FISM 0.001 1000 0.1169 0.0418 0.0509 0.1270 0.4029 TSL 0.01 500 0.1391 0.0533 0.0633 0.1442 0.4552 ICF(JI) 0.0333 0.0203 0.0196 0.0386 0.1247 ICF(CS) 0.0333 0.0206 0.0198 0.0385 0.1258 Netflix BPR 0.001 500 0.0518 0.0219 0.0233 0.0561 0.1878 FISM 0.001 1000 0.0514 0.0237 0.0242 0.0563 0.1937 TSL 0.001 100 0.0628 0.0409 0.0365 0.0730 0.2451 3. EXPERIMENTAL RESULTS Table 3: Description of the datasets used in the ex- periments, including the numbers of users (n), item- 3.1 Datasets and Evaluation Metrics s (m), training feedback (|Tr |), validation feedback In order to verify the effectiveness of our proposed point- (|Val |), and test feedback (|Te |). wise weighting scheme and time-ware similarity learning (T- SL) model, we use two large public datasets, i.e., MovieLens Dataset n m |Tr | |Val | |Te | 10M (we use ML10M for short) and Netflix, in our empir- ML10M 71567 10681 1277900 425967 425967 ical studies. ML10M contains about 10 million numerical Netflix 480189 17770 13900939 4633646 4633647 ratings from 71567 users and 10681 items, and Netflix con- tains about 100 million numerical ratings from 480189 users and 17770 items. In order to simulate the TOCCF problem setting with time-aware positive feedback, for each dataset, we first remove the (u, i, rui , tui ) quadruples with rui ≤ 4, 3.2 Baselines and Parameter Settings and then take the (u, i, tui ) triples from the remaining data. In our empirical studies, we include several state-of-the- For the resulted time-aware one-class feedback of each art methods for modeling one-class feedback in recommender dataset, we further split it according to the time stamp in systems, including neighborhood-based methods and factorization- order to generate a copy of training data, validation data based methods: and test data. We illustrate the data generation procedure • ICF(JI): item-oriented neighborhood-based collabora- in Figure 3. Specifically, we first use 60% feedback with the tive filtering with Jaccard index as the similarity mea- smallest time stamps for training; and then from the left surement. 40% feedback, we randomly sample 20% feedback for vali- dation and the remaining 20% for test. We put the statistics • ICF(CS): ICF with Cosine similarity. Both ICF(JI) of the resulted datasets in Table 3. and ICF(CS) are simple but effective approaches for OCCF. • BPR: Bayesian personalized ranking [7]. BPR is a sem- inal work based on pairwise preference assumption and usually produces the best performance in different rec- ommendation tasks. • FISM: Factored item similarity models [2]. FISM is one of the most recent work based on pointwise pref- Figure 3: An illustration of data generation for erence assumption and similarity learning. training data, validation data and test data, where training data are from the first 60% one-class feed- For neighborhood-based methods ICF(JI) and ICF(CS), back, and validation and test data are from the re- we fix the size of nearest neighbors as 20. For factorization- maining 40% feedback. based methods BPR, FISM and our TSL, we fix the dimen- sion of latent space as d = 20 and the learning rate γ = 0.01. For one-class feedback in TOCCF, we adopt several com- The iteration number in BPR, FISM and our TSL are chosen monly used evaluation metrics in ranking-oriented item rec- from T ∈ {100, 500, 1000} and the value of tradeoff param- ommendation or information retrieval scenarios. In partic- eters are chosen from α ∈ {0.001, 0.01, 0.1} all through the ular, we check the top-K performance using Precision@K, NDCG@5 on the validation data, i.e., there are nine combi- Recall@K, F1@K, NDCG@K and 1-call@K. nations of the value of the two types of parameters. 0.15 0.1 ICF(JI) ICF(JI) ICF(CS) ICF(CS) BPR BPR NDCG@K FISM NDCG@K 0.1 0.075 FISM TSL TSL 0.05 0.05 0 0.025 5 10 15 5 10 15 K K ML10M Netflix Figure 2: Recommendation performance of TSL and other methods on NDCG@K with different value of K. 3.3 Main Results ty learning approaches for recommendation with social and We report the main results in Table 2, from which we can other auxiliary data [1, 6]. have the following observations, • Two neighborhood-based methods, i.e., ICF(JI) and 5. ACKNOWLEDGMENT ICF(CS), are poor regarding the recommendation per- We thank the support of Natural Science Foundation of formance, which is caused by the intransitivity of the China (NSFC) No. 61502307, and Natural Science Founda- similarity measurements for the scarce positive feed- tion of Guangdong Province No. 2014A030310268 and No. back. Notice that the density of the training data of 2016A030313038. ML10M and Netflix are smaller than 0.2%. 6. REFERENCES • Two factorization-based methods, i.e., BPR and FIS- [1] H. Fang, G. Guo, and J. Zhang. Multi-faceted trust and M, perform much better than the neighborhood-based distrust prediction for recommender systems. Decision methods, which is expected because of the merit of Support Systems, 71:37–47, 2015. transitivity via learned latent factors. [2] S. Kabbur, X. Ning, and G. Karypis. Fism: Factored • Our proposed time-aware similarity learning method, item similarity models for top-n recommender systems. i.e., TSL, further improves FISM and BPR significant- In Proceedings of the 19th ACM SIGKDD International ly, from which we can clearly see the value of the tem- Conference on Knowledge Discovery and Data Mining, poral information and the effectiveness of our weight- KDD ’13, pages 659–667, 2013. ing scheme to integrate the time context. [3] Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD For real-world deployment of a recommendation model, International Conference on Knowledge Discovery and we usually pay more attention to its top-K performance, Data Mining, pages 447–456, 2009. because that will affect users’ behaviors most. For this rea- son, we also check the recommendation performance with [4] S. Liu, S. Wang, and F. Zhu. Structured learning from different value of K ∈ {5, 10, 15}. We show the results of heterogeneous behavior for social identity linkage. NDCG@K in Figure 2. Notice that the results on other top- IEEE Transactions on Knowledge and Data K performance are similar, and are thus not included due to Engineering, 27(7):2005–2019, 2015. space limitation. From Figure 2, we can see that the perfor- [5] R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. Lukose, mance ordering on different value of K over two datasets is M. Scholz, and Q. Yang. One-class collaborative ICF(JI), ICF(CS) < BPR, FISM < TSL, which is consistent filtering. In Proceedings of the 8th IEEE International to that of Table 2. The results on NDCG@K again show the Conference on Data Mining, ICDM ’08, pages 502–511, usefulness of the temporal context and effectiveness of our 2008. time-aware weighting scheme in similarity learning. [6] W. Pan. A survey of transfer learning for collaborative recommendation with auxiliary data. Neurocomputing, 177:447–453, 2016. 4. CONCLUSIONS AND FUTURE WORK [7] S. Rendle, C. Freudenthaler, Z. Gantner, and In this paper, we study an important recommendation L. Schmidt-Thieme. Bpr: Bayesian personalized problem termed time-aware one-class collaborative filtering ranking from implicit feedback. In Proceedings of the (TOCCF), and propose a novel time-aware similarity learn- 25th Conference on Uncertainty in Artificial ing (TSL) model based on the seminal work of factored item Intelligence, UAI ’09, pages 452–461, 2009. similarity model [2]. Empirical results show that our TSL [8] S. Wang, X. Zhou, Z. Wang, and M. Zhang. Please can incorporate the time information in a simple but ef- spread: Recommending tweets for retweeting with fective way, and is able to recommend significantly more implicit feedback. In Proceedings of the 2012 Workshop accurate ranked lists of items than several state-of-the-art on Data-driven User Behavioral Modelling and Mining methods without modeling the time information. from Social Media, DUBMMSM ’12, pages 19–22, 2012. For future work, we are interested in generalizing our time- aware similarity learning model to more advanced similari-