=Paper= {{Paper |id=Vol-1618/IFUP_paper_4 |storemode=property |title= TOCCF: Time-Aware One-Class Collaborative Filtering |pdfUrl=https://ceur-ws.org/Vol-1618/IFUP_paper_4.pdf |volume=Vol-1618 |authors= Xinchao Chen,Weike Pan,Zhong Ming |dblpUrl=https://dblp.org/rec/conf/um/ChenPM16 }} == TOCCF: Time-Aware One-Class Collaborative Filtering== https://ceur-ws.org/Vol-1618/IFUP_paper_4.pdf
     TOCCF: Time-Aware One-Class Collaborative Filtering

                                  Xinchao Chen, Weike Pan* and Zhong Ming*
                    College of Computer Science and Software Engineering, Shenzhen University
                          xcchen1993@gmail.com, {panweike,mingz}@szu.edu.cn
                                                   *: corresponding author.




ABSTRACT                                                         reason that modeling one-class feedback is considered more
One-class collaborative filtering (OCCF), or recommenda-         important is simply due to the fact that users are somehow
tion with one-class feedback such as shopping records, has       reluctant to assign a multi-class score to a product after
recently gained more attention from researchers and practi-      purchasing.
tioners in the community. The main reason is that one-class         In order to model the one-class feedback, two main lines of
feedback in the form of (user, item) pairs are often more        techniques are usually adopted, which are parallel to that of
abundant than numerical ratings in the form of (user, item,      collaborative filtering, including memory-based OCCF and
rating) triples as exploited by traditional collaborative fil-   model-based OCCF. For memory-based OCCF, the only d-
tering algorithms. However, most of the previous work on         ifference from that of memory-based CF is that the simi-
OCCF do not consider the temporal context, which is known        larity between two users or two items are estimated based
of great importance to users’ preferences and behaviors. In      on the one-class feedback instead of the ratings. For model-
this paper, we first formally define a new problem called        based OCCF, the techniques are often different from that of
time-aware OCCF (TOCCF), and then design a novel time-           model-based CF, in particular of the underlying assumption
aware similarity learning (TSL) model accordingly. Our T-        for the learning task of positive feedback only and the predic-
SL is based on a novel time-aware weighting scheme and a         tion rule based on similarity learning. The most well-known
seminal work on similarity learning, and is able to learn the    preference assumption for one-class feedback is probably the
item similarities more accurately. Empirical studies on t-       pairwise preference assumption called Bayesian personalized
wo large real-world datasets show that our TSL model can         ranking defined on the difference between a purchased prod-
integrate the temporal information effectively, and perform      uct and an un-purchased one [7]. And the most recent work
significantly better than several state-of-the-art recommen-     on similarity learning approach is the factored item similar-
dation algorithms.                                               ity model (FISM) [2], which learns the latent representation
                                                                 of items with the assumption that the inner product of two
                                                                 items’ latent factors is their similarity.
CCS Concepts                                                        The aforementioned advances in modeling one-class feed-
•Information systems → Personalization; •Human-                  back in OCCF have indeed achieved great success in various
centered computing → Collaborative filtering;                    recommendation applications. However, we find that very
                                                                 few work have explicitly studied the temporal effect in OC-
Keywords                                                         CF, though it has shown to be very helpful in user behavior
                                                                 modeling in CF [3, 4]. A recent work on microblog recom-
Time-Aware; One-Class Collaborative Filtering; Factored          mendation [8] shows that temporal information is helpful.
Item Similarity Model                                            However, the time-aware weighting scheme [8] is designed
                                                                 for the specific application, where the items (or tweets) are
1.   INTRODUCTION                                                recorded with time when they arrive at the users instead of
  One-class collaborative filtering (OCCF) [2, 5] is a recent    when they are retweeted by the users. In the studied general
research focus in the community of recommender system-           time-aware OCCF, we only have the temporal information
s. In OCCF, the data we can exploit for recommendation           when users have actions to items.
are the so-called one-class feedback such as “transactions”         In this paper, we first design a time-aware weighting scheme
in e-commerce instead of multi-class feedback or numerical       for the reliability of the positive feedback, and then propose
ratings in traditional collaborative filtering problems. The     a time-aware similarity learning (TSL) model by integrating
                                                                 the weight as a confidence score into the similarity learning
                                                                 model [2]. The time complexity of TSL is the same with
                                                                 that of FISM [2]. We conduct extensive empirical studies
                                                                 on two public large datasets with the state-of-the-art base-
                                                                 lines of memory-based methods and model-based methods.
                                                                 The empirical results show that our new similarity learning
                                                                 model is simple but very effective in exploiting the time con-
                                                                 text, and is significantly better than the algorithms without
.                                                                modeling the temporal effect.
         Figure 1: An illustration of time-aware one-class collaborative filtering (TOCCF) and OCCF.


2.   TIME-AWARE SIMILARITY LEARNING                                  and Wi′ · can be learned via some pointwise or pairwise loss
                                                                     functions in an optimization problem.
2.1 Problem Definition                                                 With the learned similarity si′ i , the preference of user u
   In time-aware one-class collaborative filtering (TOCCF),          on item i can then be predicted as follows [2],
we have n users, m items and their positive feedback in the
                                                                                           X
                                                                                   r̂ui =         si′ i + bu + pi ,             (2)
form of (user, item, time) triples, e.g., (u, i, tui ), denoting                           i′ ∈Nu \{i}
user u has a positive feedback on item i at time tui . In
TOCCF, our goal is to learn users’ preferences from the              where bu and pi are preference bias of user u and popularity
positive feedback and associated temporal information, and           bias of item i, respectively.
provide a personalized ranked list of items for each user u
that he or she may like in the future.
                                                                     2.3 Time-Aware Similarity Learning
   Notice that in OCCF [5], the temporal information is not            We introduce a confidence measurement for an observed
exploited, i.e., the data is of (user, item) pairs. We illustrate    positive feedback (u, i, tui ),
the studied problem in Figure 1, where OCCF is a special                                               1
case of TOCCF and is represented as a mixed (user, item)                                 cui =                  ,                (3)
                                                                                                 (tτ + 1) − tui
feedback matrix ignoring the time context.
                                                                     where tτ is the largest time stamp (in day) in the training
                                                                     data, and thus (tτ + 1) is used to denote the current day.
                   Table 1: Some notations.                          Notice that we use the inverse of the difference between the
      T = {u, i, tui }     positive feedback                         current time (tτ + 1) and the time the positive feedback is
      Tu                   positive feedback of user u               issued tui because a more recent feedback is more reliable,
      T′                   sampled unobserved feedback               and is thus of high confidence.
      d∈R                  latent feature number                        Our proposed confidence measurement shown in Eq.(3)
      Vi· , Wi′ · ∈ R1×d latent feature vectors                      looks similar to but is very different from the pairwise con-
      bu ∈ R               user bias                                 fidence weight in BPRC [8], which is defined on two (user,
      pi ∈ R               item bias                                 item) pairs, i.e., cuij for (u, i) and (u, j), instead of on one
      rui ∈ {1, 0}         preference of (u, i)                      single (user, item) pair in Eq.(3). Notice that in BPRC [8],
                                                                     (u, i) denotes a retweet feedback and (u, j) denotes a non-
      r̂ui                 prediction of (u, i)
                                                                     retweet feedback, while both are associated with temporal
      α                    tradeoff parameter
                                                                     information of when the tweet is received by user u. In our
      T                    iteration number
                                                                     TOCCF, we only have the temporal information of the ob-
                                                                     served positive feedback, and thus the approach in [8] is also
                                                                     not applicable to our studied problem.
                                                                        Finally, we have a general time-aware weighting scheme,
2.2 Similarity Learning                                                                  
                                                                                            cui , if (u, i, tui ) ∈ Tu ,
   It is well known that similarity measurement is critical                       ωui =                                           (4)
                                                                                            1,    otherwise,
in collaborative filtering, because it determines the neigh-
borhood of a certain user u and thus affects the preference          which means that we will weight the known (i.e., observed
prediction of user u on other items. The state-of-the-art            feedback) only. It is thus a ponitwise confidence weight.
approach [2] does not adopt traditional similarity measure-             With the time-aware weighting scheme and the loss func-
ment such as Cosine similarity or Jaccard index, but turns           tion of FISM [2], we propose to solve the following optimiza-
to learn the similarity from the preference data, which is           tion problem,
empirically more adaptive to different datasets. Mathemat-                        X         1
ically, the learned similarity in FISM [2] is represented as           min                    ωui (rui − r̂ui )2 + R(V, W, b, p), (5)
                                                                     V,W,b,p                2
follows,                                                                     (u,i,tui )∈T ∪T
                                                                                          ′


                           1                                         where T ′ is a set of randomly sampled unobserved          feed-
                si′ i = p          Vi· WiT′ · ,               (1)    back with |T ′ | = 3|T |, R(V, W, b, p) = α2 m
                                                                                                                  P
                                                                                                                         ||V    ||2 +
                         |Ni \{i}|                                      Pm               2    Pn    2     Pm 2       i=1     i·
                                                                     α                      α           α
                                                                          i′ =1 ||Wi · || + 2  u=1 bu + 2  i=1 pi is the regular-
                                                                                    ′
                                                                      2
where Ni = {i′ |(u, i′ , tui′ ) ∈ Tu }, and Vi· , Wi′ · ∈ R1×d are   ization term commonly used to avoid overfitting. The opti-
latent feature vectors to be learned for item i and item i′ ,        mization problem can be solved in a commonly used gradient
respectively. The similarity si′ i or the latent vectors Vi·         descent algorithm [2].
Table 2: Recommendation performance of TSL, FISM, BPR, ICF with Cosine similarity, i.e., ICF(CS), and
Jaccard index, i.e., ICF(JI), on ML10M and Netflix using Precision@5, Recall@5, F1@5, NDCG@5 and 1-
call@5. The best results are marked in bold. We also include the searched value of the tradeoff parameter
and iteration number in model-based methods for reproductivity.

                                        Parameter                          Evaluation metric
                Dataset    Method
                                         α     T       Precision@5    Recall@5    F1@5     NDCG@5         1-call@5
                           ICF(JI)                       0.0163        0.0052    0.0064      0.0188        0.0720
                           ICF(CS)                       0.0158        0.0050    0.0062      0.0182        0.0697
                ML10M      BPR         0.01     100      0.1186        0.0437    0.0525      0.1305        0.4021
                           FISM        0.001   1000      0.1169        0.0418    0.0509      0.1270        0.4029
                           TSL         0.01     500      0.1391        0.0533    0.0633      0.1442       0.4552
                           ICF(JI)                       0.0333        0.0203    0.0196      0.0386        0.1247
                           ICF(CS)                       0.0333        0.0206    0.0198      0.0385        0.1258
                Netflix    BPR         0.001    500      0.0518        0.0219    0.0233      0.0561        0.1878
                           FISM        0.001   1000      0.0514        0.0237    0.0242      0.0563        0.1937
                           TSL         0.001    100      0.0628        0.0409    0.0365      0.0730       0.2451



3.   EXPERIMENTAL RESULTS
                                                                     Table 3: Description of the datasets used in the ex-
                                                                     periments, including the numbers of users (n), item-
3.1 Datasets and Evaluation Metrics                                  s (m), training feedback (|Tr |), validation feedback
   In order to verify the effectiveness of our proposed point-       (|Val |), and test feedback (|Te |).
wise weighting scheme and time-ware similarity learning (T-
SL) model, we use two large public datasets, i.e., MovieLens           Dataset        n    m       |Tr |  |Val |    |Te |
10M (we use ML10M for short) and Netflix, in our empir-
                                                                       ML10M       71567 10681 1277900 425967 425967
ical studies. ML10M contains about 10 million numerical
                                                                       Netflix    480189 17770 13900939 4633646 4633647
ratings from 71567 users and 10681 items, and Netflix con-
tains about 100 million numerical ratings from 480189 users
and 17770 items. In order to simulate the TOCCF problem
setting with time-aware positive feedback, for each dataset,
we first remove the (u, i, rui , tui ) quadruples with rui ≤ 4,
                                                                     3.2 Baselines and Parameter Settings
and then take the (u, i, tui ) triples from the remaining data.        In our empirical studies, we include several state-of-the-
   For the resulted time-aware one-class feedback of each            art methods for modeling one-class feedback in recommender
dataset, we further split it according to the time stamp in          systems, including neighborhood-based methods and factorization-
order to generate a copy of training data, validation data           based methods:
and test data. We illustrate the data generation procedure
                                                                        • ICF(JI): item-oriented neighborhood-based collabora-
in Figure 3. Specifically, we first use 60% feedback with the
                                                                          tive filtering with Jaccard index as the similarity mea-
smallest time stamps for training; and then from the left
                                                                          surement.
40% feedback, we randomly sample 20% feedback for vali-
dation and the remaining 20% for test. We put the statistics            • ICF(CS): ICF with Cosine similarity. Both ICF(JI)
of the resulted datasets in Table 3.                                      and ICF(CS) are simple but effective approaches for
                                                                          OCCF.

                                                                        • BPR: Bayesian personalized ranking [7]. BPR is a sem-
                                                                          inal work based on pairwise preference assumption and
                                                                          usually produces the best performance in different rec-
                                                                          ommendation tasks.
                                                                        • FISM: Factored item similarity models [2]. FISM is
                                                                          one of the most recent work based on pointwise pref-
Figure 3: An illustration of data generation for
                                                                          erence assumption and similarity learning.
training data, validation data and test data, where
training data are from the first 60% one-class feed-                    For neighborhood-based methods ICF(JI) and ICF(CS),
back, and validation and test data are from the re-                  we fix the size of nearest neighbors as 20. For factorization-
maining 40% feedback.                                                based methods BPR, FISM and our TSL, we fix the dimen-
                                                                     sion of latent space as d = 20 and the learning rate γ = 0.01.
  For one-class feedback in TOCCF, we adopt several com-             The iteration number in BPR, FISM and our TSL are chosen
monly used evaluation metrics in ranking-oriented item rec-          from T ∈ {100, 500, 1000} and the value of tradeoff param-
ommendation or information retrieval scenarios. In partic-           eters are chosen from α ∈ {0.001, 0.01, 0.1} all through the
ular, we check the top-K performance using Precision@K,              NDCG@5 on the validation data, i.e., there are nine combi-
Recall@K, F1@K, NDCG@K and 1-call@K.                                 nations of the value of the two types of parameters.
                  0.15                                                        0.1
                                                         ICF(JI)                                                   ICF(JI)
                                                         ICF(CS)                                                   ICF(CS)
                                                         BPR                                                       BPR
         NDCG@K                                          FISM




                                                                   NDCG@K
                   0.1                                                      0.075                                  FISM
                                                         TSL                                                       TSL


                  0.05                                                       0.05



                    0                                                       0.025
                         5        10                15                              5         10              15
                                       K                                                           K
                                ML10M                                                      Netflix


Figure 2: Recommendation performance of TSL and other methods on NDCG@K with different value of K.


3.3 Main Results                                                      ty learning approaches for recommendation with social and
  We report the main results in Table 2, from which we can            other auxiliary data [1, 6].
have the following observations,
     • Two neighborhood-based methods, i.e., ICF(JI) and
                                                                      5. ACKNOWLEDGMENT
       ICF(CS), are poor regarding the recommendation per-               We thank the support of Natural Science Foundation of
       formance, which is caused by the intransitivity of the         China (NSFC) No. 61502307, and Natural Science Founda-
       similarity measurements for the scarce positive feed-          tion of Guangdong Province No. 2014A030310268 and No.
       back. Notice that the density of the training data of          2016A030313038.
       ML10M and Netflix are smaller than 0.2%.
                                                                      6. REFERENCES
     • Two factorization-based methods, i.e., BPR and FIS-
                                                                      [1] H. Fang, G. Guo, and J. Zhang. Multi-faceted trust and
       M, perform much better than the neighborhood-based
                                                                          distrust prediction for recommender systems. Decision
       methods, which is expected because of the merit of
                                                                          Support Systems, 71:37–47, 2015.
       transitivity via learned latent factors.
                                                                      [2] S. Kabbur, X. Ning, and G. Karypis. Fism: Factored
     • Our proposed time-aware similarity learning method,                item similarity models for top-n recommender systems.
       i.e., TSL, further improves FISM and BPR significant-              In Proceedings of the 19th ACM SIGKDD International
       ly, from which we can clearly see the value of the tem-            Conference on Knowledge Discovery and Data Mining,
       poral information and the effectiveness of our weight-             KDD ’13, pages 659–667, 2013.
       ing scheme to integrate the time context.                      [3] Y. Koren. Collaborative filtering with temporal
                                                                          dynamics. In Proceedings of the 15th ACM SIGKDD
  For real-world deployment of a recommendation model,
                                                                          International Conference on Knowledge Discovery and
we usually pay more attention to its top-K performance,
                                                                          Data Mining, pages 447–456, 2009.
because that will affect users’ behaviors most. For this rea-
son, we also check the recommendation performance with                [4] S. Liu, S. Wang, and F. Zhu. Structured learning from
different value of K ∈ {5, 10, 15}. We show the results of                heterogeneous behavior for social identity linkage.
NDCG@K in Figure 2. Notice that the results on other top-                 IEEE Transactions on Knowledge and Data
K performance are similar, and are thus not included due to               Engineering, 27(7):2005–2019, 2015.
space limitation. From Figure 2, we can see that the perfor-          [5] R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. Lukose,
mance ordering on different value of K over two datasets is               M. Scholz, and Q. Yang. One-class collaborative
ICF(JI), ICF(CS) < BPR, FISM < TSL, which is consistent                   filtering. In Proceedings of the 8th IEEE International
to that of Table 2. The results on NDCG@K again show the                  Conference on Data Mining, ICDM ’08, pages 502–511,
usefulness of the temporal context and effectiveness of our               2008.
time-aware weighting scheme in similarity learning.                   [6] W. Pan. A survey of transfer learning for collaborative
                                                                          recommendation with auxiliary data. Neurocomputing,
                                                                          177:447–453, 2016.
4.    CONCLUSIONS AND FUTURE WORK
                                                                      [7] S. Rendle, C. Freudenthaler, Z. Gantner, and
   In this paper, we study an important recommendation                    L. Schmidt-Thieme. Bpr: Bayesian personalized
problem termed time-aware one-class collaborative filtering               ranking from implicit feedback. In Proceedings of the
(TOCCF), and propose a novel time-aware similarity learn-                 25th Conference on Uncertainty in Artificial
ing (TSL) model based on the seminal work of factored item                Intelligence, UAI ’09, pages 452–461, 2009.
similarity model [2]. Empirical results show that our TSL
                                                                      [8] S. Wang, X. Zhou, Z. Wang, and M. Zhang. Please
can incorporate the time information in a simple but ef-
                                                                          spread: Recommending tweets for retweeting with
fective way, and is able to recommend significantly more
                                                                          implicit feedback. In Proceedings of the 2012 Workshop
accurate ranked lists of items than several state-of-the-art
                                                                          on Data-driven User Behavioral Modelling and Mining
methods without modeling the time information.
                                                                          from Social Media, DUBMMSM ’12, pages 19–22, 2012.
   For future work, we are interested in generalizing our time-
aware similarity learning model to more advanced similari-