Divide and Transfer: Understanding Latent Factors for Recommendation Tasks Vidyadhar Rao Rosni K V∗ Vineet Padmanabhan TCS Research Labs, India University of Hyderabad, India University of Hyderabad, India vidyadhar.rao@tcs.com rosnikv@gmail.com vineetcs@uohyd.ernet.in ABSTRACT Traditionally, latent factor models have been the most successful techniques to build recommendation systems. While the key is to capture the user interests effectively, most research is focused on learning latent factors under cold-start and data sparsity situations. Our work brings a complementary approach to the previous studies showing that understanding the semantic aspects of latent factors could give a hint on how to transfer useful knowledge from auxiliary domain(s) to the target domain. In this work, we propose a collaborative filtering technique that can effectively utilize the user preferences and content information. In our approach, we follow a divide and transfer strategy that could derive semantically meaningful latent factors and utilize only the Figure 1: Illustration of cross domain scenario: Same users appropriate components for recommendations. We demonstrate the might have different rating preferences for two genres in effectiveness of our approach due to improved latent feature space movie recommendation system. in both single and cross-domain tasks. Further, we also show its robustness by performing extensive experiments under cold-start movies, it may be easy to recommend upcoming new movies, but and data sparsity contexts. how do we recommend books that have similar plots. Typically, in CCS CONCEPTS single-domain user preferences from only one domain are used to recommend items within the same domain, and in cross-domain • Information systems → Recommender systems; Collabo- user preferences from auxiliary domain(s) are used to recommend rative filtering; • Computing methodologies → Topic mod- items on another domain. Hence, producing meaningful recommen- eling; Learning latent representations; dations depends on how well the assumptions on the source domain align with the operating environment. While the key is to transfer 1 INTRODUCTION user interests from source to target domain, this problem has two Most of the e-commerce businesses would want to help the cus- characteristics: (1) Cold-start problem i.e., shortage of information tomers surf through items that might interest them. Some examples for new users or new items; and (2) Data sparsity problem i.e., users include recommending books by Goodreads, products by Amazon, generally rate only a limited number of items. movies by Netflix, music by Last.fm, news articles by Google, etc. Traditionally, in single-domain, the latent factor models [8] are The most popular recommender systems follow two paradigms: used to transform the users and items into a common latent feature Collaborative filtering (CF): utilize the preferences from a group of space. Intuitively, users’ factors encode the ‘preferences’ while the users and suggest items to other users; and Content-based filter- item factors encode the ‘properties’. However, the user and item ing: recommend items that are similar to those that a user liked latent factors have no interpretable meaning in natural language. in the past. In general, effective recommendations are obtained by Moreover, these techniques fail in the cross-domain scenarios be- combining both content-based and collaborative features. cause the learned latent features may not align over different do- While many of these approaches are shown to be effective (in mains. Thus, understanding the semantic aspects of the latent factors single domain), but in practice they have to operate in challeng- is highly desirable in cross domain research under cold-start and data ing environment (in cross domain), and deliver more desirable sparsity contexts. recommendations. For example, based on user’s watched list on In cross domain scenario, tensor factorization models [7] try to represent the user-item-domain interaction with a tensor of order ∗ The work was conducted during an internship at TCS Research Labs, India. three and factorize users, items and domains into latent feature RecSysKTL Workshop @ ACM RecSys ’17, August 27, 2017, Como, Italy vectors. Essentially, these methods improve recommendations when © 2017 Copyright is held by the author(s). . the rating matrices from auxiliary domains share similar user-item rating patterns. However, the user behavior in all domains may not always be same and each user might have different domains of interest (see Fig. 1). Moreover, when the auxiliary information from multiple sources are combined, learned latent features may some times degrade the quality of recommendations. (see Section 6.2.1) 1 To address these shortcomings, we propose a method that can features and exploit the contextual information (such as text) in derive semantically meaningful latent factors in a fully automatic order to understand the meaning of the latent factors. manner, and can hopefully improve the quality of recommendations. In our research, we built an algorithm inspired by technique1 The major contributions of this work are: collaborative topic modeling [22] that can make recommendations by adjusting user preferences and content information. We com- (1) We hypothesize that the intent of the users are not signifi- bine this model with the specific word with background model [3] cantly different with respect to the document-specific and that can account for the semantic aspects of texts from multiple corpus-specific background information and thus, they can domains. Thus, our model can transfer only the useful informa- be ignored when learning the latent factors. tion to the target domain by improving the latent feature space. To (2) We propose a collaborative filtering technique that segments validate our hypothesis, we conducted experiments in both single the latent factors into semantic units and transfer only useful and cross domain recommendations in extreme cold-start and data components to target domain. (see Section 4) sparsity scenarios, and reflect on the factors effecting the quality (3) We demonstrate the superiority of the proposed method of recommendations. in both single and cross-domain settings. Further, we show consistency of our approach due to improved latent features Table 1: Notations used in this paper under the cold-start and data sparsity contexts. (see Section 6) User node U 2 RELATED WORK Item node V Among the latent factor models, matrix factorization (and it’s vari- Rating node r ants) [8] is the popular technique that tries to approximate an Number of users Nu observed rating matrix to derive latent features. The basic principle Regularization parameter for user λu is to find a common low-dimensional representation for both users Regularization parameter for item λv and items i.e, reduce the rank of user-item matrix directly. Never- Number of documents(items) D theless, the reduction approach addresses the sparsity problem by Number of topics T removing the unrepresentative or insignificant users or items. Dis- Number of words in a document (d) Nd carding any useful information in this process may hinder further nt h word in the document (d) wdn progress in this direction. To mitigate the cold start problem, their Topic assignment for document (d) and word (wdn ) zd ,zdn variants [6] exploit user preferences or behavior from implicit feed- Switch variable x backs to improve personalized recommendations. Our work differs Prob. of topics given documents (T × D) θ from these as we do not completely remove the content information. Prob. of switch variable given document (D × 3) λ Instead, we assume that the content information about the items Prob. of words given special word distribution could be captured from multiple semantic aspects, which are good of document (d) ψd at explaining the factors that contributed to the user preferences. Prob. of words given topics (W × T ) ϕ Another popular latent factor model is one-class collaborative Prob. of words given background distribution Ω filtering [14] that tries to resolve the data sparsity problem by inter- Dirichlet prior on document-topic distribution, θ (D × T ) α preting the missing ratings as a mixture of negative examples and Dirichlet prior on switch variable distributions γ unlabeled positive examples. Essentially, their task is to distinguish Dirichlet prior on word-topic distribution (ϕ) β0 between user’s lack of interest in an item to user’s lack of awareness Dirichlet prior on special word distribution (ψ ) β1 of the item. Alternately, others exploit the user generated informa- Dirichlet prior on background word distribution (Ω) β2 tion [9, 19] or create a shared knowledge from rating matrices in multiple domains [4, 13, 15, 16]. However, they have limited utility as users tend to show different ratings patterns across the domains. In our work, we do not use the user generated content, and we 3 PRELIMINARIES only use the user preferences along with the content information of items to learn the latent feature space. Our method follows the same line as the collaborative topic re- While all these approaches try to mitigate the cold-start and gression model (CTR) [22], in the sense that latent factors of the data sparsity in the source domain, we focus on understanding content information are integrated with the user preferences. The the semantics aspects of the learned latent factors. Many methods main difference of our model with this approach is the way in tried [11, 18, 21, 23] to adjust the latent factors according to the which we derive meaningful topical latent factors from the con- context. For example, in the job recommendation application, the tent information and enable better predictions on recommendation cross-domain segmented model [18] introduces user-based domains tasks in general. For this we use the specific word with background to derive indicator features and segment the users into different model [3]. Before we describe our model, we give a brief review domains. A projection based method [23] learns a projection ma- of the existing latent factor models which serve as a basis for our trix for each user that is able to capture the complexities of their approach. The notations for graphical models are given in Table 1. preferences towards certain items over others. Our work differs 1 https://open.blogs.nytimes.com/2015/08/11/building-the-next-new-york-times- from these in the sense that we consider the domains based on item recommendation-engine/ 2 (a) Prob. Matrix Factorization (PMF) [12] (b) Latent Dirichlet Allocation (LDA) [20] (c) Collaborative Topic Regression (CTRlda) [22] (d) Special Word with Background (SWB) [3] (e) Proposed Divide and Transfer Model (CTRswb) Figure 2: Graphical representation of the latent topic models and recommendation models: (a) Probabilistic Matrix Factoriza- tion (PMF), (b) Latent Dirichlet Allocation model (LDA), (c) Special words with background model (SWB), (d) Collaborative Topic Regression (CTRlda), (e) Proposed divide and transfer model (CTRswb). The hidden nodes —the topic proportions, as- signments, and topics —are unshaded. The observed nodes —the ratings and words of the documents —are shaded. 3.1 Probabilistic Matrix Factorization 3.3 Collaborative Topic Regression Matrix factorization models the user-item preference matrix as Collaborative topic regression (CTRlda) model [22] combines the a product of two lower-rank user and item matrices. Given an latent topics from LDA, and the user-item features from PMF to observed matrix, the matrix factorization for collaborative filtering jointly explain the observed content and user ratings, respectively. can be generalized as a probabilistic model (PMF) [12], which scales This model has two benefits over traditional approaches: (a) gen- linearly with the number of observations. The graphical model for eralization to unseen or new items (b) generate interpretable user PMF is shown in Fig. 2a. In the figure, a user i is represented by profiles. The graphic model for CTRlda model is shown in Fig. 2c. a latent vector ui ∈ R K and an item j by a latent vector v j ∈ R K , More details about this model are given in section 4. where K is the shared latent low-dimensional space. 3.4 Special Word Topic Model 3.2 Latent Dirichlet Allocation Special words with background model (SWB) [3] is based on LDA Latent Dirichlet Allocation (LDA) [1, 20] is known to be power- which models words in a document as either originating from ful technique for discovering and exploiting the hidden thematic general topics, or from document-specific word distributions, or structure in large archives of text. The principle behind LDA is that from a corpus-wide background distribution. To achieve this, the documents exhibit multiple topics and, each topic can be viewed SWB model introduces additional switch variables into the LDA as a probability distribution over a fixed vocabulary. The richer model to account for multiple word distributions. The SWB model structure in the latent topic space allows to interpret documents has similar general structure to the LDA model as shown in Fig. 2d. in low-dimensional representations. Fig. 2b depicts the graphic The main advantage of SWB model is that it can trade-off between model for the LDA model where the latent factors word-topic (ϕ) generality and specificity of documents in a fully probabilistic and and the topic-document (θ ) are inferred from a given collection of automated manner. An incremental version of this model exploits documents. this feature to build an automatic term extractor [10]. 3 4 DIVIDE AND TRANSFER LATENT TOPICS 4.2 Posterior Inference Consider we have a set of I users, J items, and the rating variable The generative process is described in Algorithm 1 which combines r i j ∈ {0, 1} that indicates if user i likes item j or not. For each SWB [3] model (lines 1-14) and PMF [12] model (lines 15-26). We user, we want to recommend items that are previously unseen and summarize the repeated sampling of word distributions for each potentially interesting. Traditionally, the latent factor models try topic and user factors, and the predictions of user-item ratings. to learn a common latent space of the users and items, given an observed user-item preference matrix. Essentially, the recommen- 4.2.1 Learning Parameters. Computing full posterior of the pa- dation problem minimizes the regularized squared error loss with rameters ui , v j , θ j is intractable. Therefore, we adapt the EM-style I and (v ) J , respect to the (ui )i=1 algorithm, as in CTRlda [22], to learn the maximum-a-posteriori j j=1 estimates. We refer the interested reader to CTRlda [22] for more (r i j − uTi v j ) 2 + λu ||ui || 2 + λv ||v j || 2 details. It was shown that fixing θ j as the estimate gives comparable X min (1) i, j performance with vanilla LDA. We discovered that EM algorithm convergence improved significantly when θ j from the SWB topic where λu and λv are regularization parameters. Probabilistic matrix route is used as initial estimate. (see Fig. 3b) factorization (PMF) [12] solves this problem by drawing the ratings for a given user-item pair from a Gaussian distribution given by 4.2.2 Making Predictions. After learning (locally) all the param- r iˆj ∼ N (uTi v j , c i j ) (2) eters, subject to a convergence criteria, we can use the learned I , (v ) J , (θ ∗ ) J , (ϵ ∗ ) J in Eq. (3) for predic- latent features (ui )i=1 j j=1 j j=1 j j=1 where c i j is a confidence parameter for r i j . In our work, we are tions. Note that for new or unseen items, we do not have the offset interested in jointly modeling the user preferences and the content value i.e., ϵ j = 0, hence the prediction completely relies on the topic information to improve the quality of recommendations. We strict proportion derived from either latent models LDA or SWB model. to the assumption that the content information from single/multiple domain(s) and users share a common latent topic space. Our model 4.2.3 Discussion. It is a common practice that the ratings of builds on the CTRlda [22] model which can effectively balance the items are given on a scale, and the latent models try to predict user preferences and the content information. This is achieved by the rating for a new user-item pair. In such cases, the factorization including the latent variable ϵ j that offsets the topic proportion θ j machines [17] are known to work for a variety of general prediction i.e., v j = θ j + ϵ j , where the item latent vector v j is close to the topic tasks like classification, regression, or ranking. In our setup, the proportion θ j derived from LDA and could diverge from it if it has ratings are binary i.e., r i j ∈ {0, 1} where r i j = 0 can be interpreted to. Here, the expectation of rating for a user-item pair is a simple in two ways: either ui is not interested in v j , or ui does not know about linear function of θ j , i.e., v j . In a way our goals differ from the prediction tasks considered in factorization machines. Our study shows we can make predictions E[r i j | ui , θ j , ϵ j ] = uTi (θ j + ϵ j ) (3) to unseen items while deriving meaningful latent factors. This explains how much of the prediction relies on content and While making predictions to unseen items, it is important to how much it relies on how many users have rated an item. We see how effectively they can fuse the content information from propose a straightforward extension of CTRlda that replaces the multiple sources. In our model, the semantic units are effective for topic proportions derived from LDA [1] with multiple semantic representation of latent factors, and has advantages over CTRlda proportions derived from SWB [3] over the common topic space. model. While the user preferences across the domains are very different (see Fig. 1), the background word distributions are nearly 4.1 Graphical Model similar across all items, and therefore, its contribution towards v j is not significant. Additionally, the specific words that occur in the Our model is based on placing additional latent variables into documents do not convey much information about the user prefer- CTRlda model that can account for semantic aspects of the latent ences. Hence, we can discard the Ω,ψ distributions and only use factors. The graphical model of divide and transfer model, referred the θ j derived from the general topic route of the SWB [3] model. as ‘CTRswb’, is shown in Fig. 2e. Subsequently, we demonstrate that CTRswb could learn better rep- 4.1.1 Deriving Semantic Factors. [3] As can be seen in the figure, resentations for the latent features compared to the CTRlda [22]. the latent variable x, associated with each word, acts as switch: when x = 0, the word is generated via topic route; when x = 1, it is generated via document-specific route; and for x = 2, it is 5 EXPERIMENTS generated via background route which is corpus specific. For x = 0 We demonstrate the efficacy of our approach (CTRswb) in both case, like LDA, words are sampled from document-topic (θ ) and single and cross domain scenarios on CiteULike dataset and Movie- word-topic (ϕ) multinomials with α and β 0 as respective symmetric Lens dataset, respectively. For single domain, we adapt the same Dirichlet priors. For x = 1 or x = 2, words are sampled from experiment settings as that of CTRlda [22]. Since, cross-domain document-specific (ψ ) or corpus-specific (Ω) multinomials with β 1 applications can be realized in multiple ways [2], we consider the and β 2 as symmetric Dirichlet priors, respectively. The variable x shared user’s setup across multiple domains in two different con- is sampled from a document-specific multinomial λ, which in turn texts: (1) recommendations in cold-start context, where we study has a symmetric Dirichlet prior, γ . Since, the words are sampled the impact of number of topics in the auxiliary domain(s) and (2) from mutiple topic routes, our model can automatically deduce the recommendations in data sparsity context, where we study the latent features in a precise and meaningful manner. impact of number of ratings in the auxiliary domain(s). 4 Algorithm 1: Generative process for CTRswb model to get item description for the movies. The basic statistics of the dataset collected are reported in the Table 2. 1 Select a background distribution over words Ω|β 2 ∼ Dir (β 2 ) 2 for each topic k ∈ 1, ....,T do Table 2: MovieLens 1M Data statistics 3 Select a word distribution ϕ k |β 0 ∼ Dir (β 0 ) 4 end Movie Genre No.Items No.Users No.Ratings Rating Ratio 5 for each document d ∈ 1, ...D do Drama 1,493 5,881 352,834 0.040 6 Select a distribution over topics θd |α ∼ Dir (α ) Comedy 1,163 5,881 354,455 0.052 7 Select a special-words distribution over words Thriller 485 5,881 188,968 0.066 ψd |β 1 ∼ Dir (β 1 ) Romance 459 5,881 146,916 0.054 8 Select a distribution over switch variables λd |γ ∼ Beta(γ ) Action 495 5,881 256,515 0.088 9 for n = 1 : Nd words in document d do Total 4,095 5,881 1,299,688 0.054 10 Select a switch variable xdn |λd ∼ Mult (λd ) 11 Select zdn |{θd , xdn } ∼ Mult (θd ) δ (xd n ,1) δ (zdn , SW ) δ (xd n ,2) δ (zdn , BG) δ (xd n ,3) 5.3 Evaluation Methodology and Metrics 12 Generate a word: wdn |{zdn , xdn , ϕ,ψd , Ω} ∼ We evaluate the recommendation tasks by using the standard perfor- Mult (ϕ zd n ) δ (xd n ,1) Mult (ψd ) δ (xd n ,2) Mult (Ω) δ (xd n ,3) mance metrics: Precision, Recall and Mean Average Precision(MAP). 13 end The results shown are averaged over all the users. In our studies, 14 end we set the parameters of PMF and CTRlda by referring to [22]. For 15 for user i ∈ 1...Nu do PMF, λu = λv = 0.01, a = 1, b = 0.01. For CTRlda model, T = 200, 16 Draw ui ∼ N (0, λu−1 IT ) λu = 0.01, λv = 100, a = 1, b = 0.01. For CTRswb model, we set 17 end α = 0.1, β 0 , β 2 = 0.01, β 1 = 0.0001, γ = 0.3 (all weak symmetric 18 for item j ∈ 1...D do priors are set to default), T = 200, λu = 0.01, λv = 100, a = 1, 19 Draw ϵ j ∼ N (0, λv−1 IT ) b = 0.01. 20 Compute v j = ϵ j + θ j 6 RESULTS AND DISCUSSION 21 end 22 for user i ∈ 1...Nu do 6.1 Study I: Single Domain recommendations 23 for item j ∈ 1...D do In this set of experiments, we compare the performance of the prob- 24 Draw ri j ∼ N (uTi v j , c i j ) abilistic matrix factorization (PMF), CTR model [22] which make 25 end use of latent topics from LDA (CTRlda), and the proposed CTRswb 26 end model. Fig. 3a shows our results on CiteULike dataset under the settings defined in [22]. In the graph, we also show how the topic proportion from LDA and SWB alone (i.e, when the user rating patterns from the train set are not considered) make predictions on 5.1 CiteULike Dataset the test set for topK (from 20 to 300) recommendations. We conducted experiments in single domain using dataset from We can see that CTRswb consistently gives better recommenda- the CiteULike 2 , a free service social network for scholars which tions than other factor models for different number of recommen- allows users to organize (personal libraries) and share papers they dations. Moreover, the margin of improvement for smaller number are reading. We use the metadata of CiteULike from [22] collected of recommendations is large between the CTRswb and CTRlda during 2004 and 2010. The dataset contains 204, 986 pairs of ob- methods. Clearly, the PMF model lacks the content information and served ratings with 5551 users and 16, 980 articles. Each user has the pure content based models do not utilize user preferences and 37 articles in their library on an average and only 7% of the users therefore, under-perform w.r.t CTR based models. has more than 100 articles. That is, the density of dataset is quite Further, we also show the performance of CTR based methods low: 0.2175%. Item or article is represented by it’s title and abstract. when subjected to iterative optimization of the parameter θ j . We After pre processing the corpus, 8000 unique words are generated observe that the CTRswb model has a faster convergence compared as vocabulary. to CTRlda model as plotted in Fig. 3b. Clearly, the error gap analy- sis shows that the latent topics transferred from SWB model are 5.2 MovieLens Dataset in agreement with the consistent performance improvement of To conduct the experiments in cross-domain, we have used the CTRswb methods over the CTRlda. dataset provided by Grouplens [5]. We extracted five genres with In Fig. 3c, we show the performance of CTR based methods both most ratings out of the 19 genres from the 1 million movielens with and without θ j optimization. The reason for CTRswb method dataset: Action, Comedy, Drama, Romance, Thriller. Since the movie- giving the best performance, in both cases, is that in the real world lens dataset has only user generated tags, we crawled the IMDB 3 item descriptions there will be lot of item specific terms, which will not be that much helpful for the recommendations. By removing 2 http://www.citeulike.org/ the background terms of the corpus and specific terms from each 3 http://www.imdb.com items, we could aggregate the θ j value in a precise manner. 5 (a) Recall measure (b) Convergence curve (c) θ optimization Figure 3: CiteULike Dataset in single domain recommendations: (a) Comparison for different recommendation algorithms in terms of recall measure. (b) Convergence curve of CTRlda and CTRswb w.r.t no. of iterations during θ optimization. (c) Performance of CTR based methods w.r.t θ optimization in terms of recall measure. (Best viewed in color) 6.2 Study II: Cross Domain recommendations Table 3: MovieLens Dataset: Comparison of different recom- mendation algorithms in terms of MAP, Precision (P), Recall In the cross-domain settings, we consider every genre in the dataset (R). Here, we show the performance with 80 latent factors for as a target domain while the other domains are treated as its auxil- all the five cold start scenarios. Bold numbers indicate best iary domains. For example, if “Action” genre is the target domain, performance for a given target domain. the other four genres will constitute as the source domains. 6.2.1 Cold-start scenario and the impact of number of topics: In Genre Method MAP@20 P@10 P@20 R@10 R@20 this study, we consider the scenario when zero-rating information from the target domain while learning the latent topic features. Action PMF 0.133 0.072 0.069 0.013 0.024 From Table 2, we pick one of the genres as the target domain and LDA 0.057 0.025 0.026 0.005 0.01 create five cold-start scenarios (one for each genre in the dataset). SWB 0.244 0.136 0.11 0.035 0.052 We have run the algorithms PMF, LDA, SWB, CTRlda, CTRswb for CTRlda 0.099 0.061 0.057 0.013 0.025 each of the cold-start situations. CTRswb 0.306 0.176 0.14 0.051 0.07 Figs. 4a–4e show mean average precision for top20 recommen- Comedy PMF 0.101 0.05 0.049 0.008 0.014 dations for five target genres. We can see that the MAP score of LDA 0.073 0.024 0.027 0.004 0.009 PMF model did not improve much when the number of latent fac- SWB 0.122 0.059 0.05 0.009 0.016 tors are increased. Notice that, in many cases, the CTRlda method CTRlda 0.059 0.029 0.026 0.007 0.012 degrades the quality of recommendations when compared to tradi- CTRswb 0.147 0.074 0.061 0.011 0.018 tional PMF. Moreover, the CTRlda is highly sensitive to the number of Drama PMF 0.09 0.039 0.038 0.006 0.012 latent factors and we noticed it consistently perform worse than the LDA 0.075 0.027 0.027 0.004 0.009 CTRswb. This could be reasoned as one of the potential problems with SWB 0.1 0.044 0.041 0.009 0.016 the learned topics that are obtained by feature fusion from multiple CTRlda 0.024 0.011 0.013 0.001 0.004 domains. The CTRswb approach explicitly models these aspects CTRswb 0.235 0.07 0.055 0.02 0.026 and provides ability to improve the latent features. As we can see Romance PMF 0.099 0.048 0.046 0.015 0.028 in the picture, our model consistently produces better quality of LDA 0.038 0.012 0.014 0.004 0.009 recommendations for different number of latent factors. SWB 0.094 0.029 0.025 0.024 0.037 Fig. 4f shows the performance when averaged over all genres. CTRlda 0.056 0.036 0.024 0.022 0.027 From the plot, we observed that using 80 latent factors showed best CTRswb 0.367 0.084 0.06 0.061 0.07 performance for all genres except for comedy genre. The devia- Thriller PMF 0.127 0.063 0.06 0.016 0.029 tion in the case of “comedy” genre is expected as the number of LDA 0.076 0.035 0.028 0.012 0.018 items in the source domains are relatively less. Table 3 shows the SWB 0.079 0.041 0.04 0.01 0.02 performance of the different recommendations algorithms when CTRlda 0.084 0.038 0.031 0.016 0.027 80 latent topics are used. Clearly, the proposed CTRswb model CTRswb 0.162 0.09 0.073 0.022 0.034 significantly improves over CTRlda and other methods in all the cold-start scenarios. 6 (a) Action (b) Comedy (c) Drama (d) Romance (e) Thriller (f) Mean of all Genres Figure 4: Movielens Dataset in cross-domain recommendations: Impact of different sizes of latent topic space on quality of recommendations. Here, we use one of the genre as target domain, and remaining four as source domain. (Best viewed in color) 6.2.2 Data sparsity scenario and the impact of number of ratings: better performance, by large margin, than these methods. Over all, In this study, to explore the behavior of cross-domain recommen- the results show that the latent factors of CTRswb are very reli- dation, we examined the latent topic space under data sparsity able and could improve the recommendations even under extreme scenario. We use the same movielens data as in Table 2 and create sparse data scenarios. 10 data sparsity situations by incrementally removing (random) 10% of the ratings from the source genres. Throughout, similar to 7 CONCLUSIONS study in cold-start context, we do not use ratings from the target We have proposed an approach to validate our hypothesis that the genre. To validate our findings, we have shown the evaluations quality of recommendations can be improved by explicitly utiliz- only for the topic space of 80 latent factors. Figs. 5a–5e shows mean ing the general topic word distributions while learning the latent average precision of top20 recommendations for different degrees features. Our approach recommends items to users based on both of sparsity (rating ratio) in the source domain. content and user preferences, and could at best exploit the content The effect of number of ratings is much clear and straightfor- information in both single and cross-domain scenarios. Our results ward compared to the effects of number of latent factors. The results on single-domain show the superiority over pure latent factor and reveal that the number of ratings in source genres have a significant CTRlda models, and results on the cross-domain demonstrate its impact on the accuracy. However, the scale of the impact is very robustness under cold-start and data sparsity situations. different on each target domain as number of ratings in some genres In the future, we plan to explore cross-domain recommendation are less. From the plots, it shows that the more user preferences are scenarios in heterogeneous settings (e.g movies to books). In addi- available in auxiliary domains, the better the accuracy of recom- tion to this, we have used a simple collaborative filtering approach mendations on target domain. When the number of ratings have with zero-rating information from target domain, we believe utiliz- increased, the PMF, LDA, SWB and CTRlda have shown moderate ing the target domain ratings could result in better cross-domain improvements in terms of MAP. Our approach consistently shows recommendations. 7 (a) Action (b) Comedy (c) Drama (d) Romance (e) Thriller (f) Mean of all Genres Figure 5: Movielens Dataset in cross-domain recommendations: Impact of the quality of recommendations for different amounts of ratings from the source domains. Here, we show the stability of CTRswb method when subjected to both cold- start (w.r.t target domain) and data sparsity (w.r.t source domains) scenario. (Best viewed in color) REFERENCES [12] Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic matrix factorization. [1] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. In NIPS. Journal of machine Learning research (2003). [13] Orly Moreno, Bracha Shapira, Lior Rokach, and Guy Shani. 2012. Talmud: trans- [2] Iván Cantador, Ignacio Fernández-Tobías, Shlomo Berkovsky, and Paolo Cre- fer learning for multiple domains. In Proceedings of the 21st ACM international monesi. 2015. Cross-domain recommender systems. In Recommender Systems conference on Information and knowledge management. Handbook. Springer. [14] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz, [3] Chaitanya Chemudugunta, Padhraic Smyth, and Mark Steyvers. 2007. Modeling and Qiang Yang. 2008. One-class collaborative filtering. In ICDM. general and specific aspects of documents with a probabilistic topic model. In [15] Weike Pan, Nathan N Liu, Evan W Xiang, and Qiang Yang. 2011. Transfer learning NIPS. to predict missing ratings via heterogeneous user feedbacks. In IJCAI. [4] Wei Chen, Wynne Hsu, and Mong Li Lee. 2013. Making recommendations from [16] Weike Pan, Evan Wei Xiang, Nathan Nan Liu, and Qiang Yang. 2010. Transfer multiple domains. In ACM SIGKDD. Learning in Collaborative Filtering for Sparsity Reduction. In AAAI. [5] F Maxwell Harper and Joseph A Konstan. 2016. The movielens datasets: History [17] Steffen Rendle. 2010. Factorization machines. In ICDM. and context. ACM Transactions on Interactive Intelligent Systems (TiiS) (2016). [18] Shaghayegh Sahebi and Trevor Walker. 2014. Content-Based Cross-Domain [6] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for Recommendations Using Segmented Models. CBRecSys (2014). implicit feedback datasets. In ICDM. [19] Yue Shi, Martha Larson, and Alan Hanjalic. 2011. Tags as bridges between [7] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. domains: Improving recommendation with tag-induced cross-domain collabo- 2010. Multiverse recommendation: n-dimensional tensor factorization for context- rative filtering. In International Conference on User Modeling, Adaptation, and aware collaborative filtering. In ACM REcSys. Personalization. Springer. [8] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech- [20] Mark Steyvers and Tom Griffiths. 2007. Probabilistic topic models. Handbook of niques for recommender systems. Computer (2009). latent semantic analysis (2007). [9] Bin Li, Qiang Yang, and Xiangyang Xue. 2009. Transfer learning for collaborative [21] Fatemeh Vahedian and Robin D Burke. Predicting Component Utilities for Linear- filtering via a rating-matrix generative model. In ICML. ACM. Weighted Hybrid Recommendation. [10] Sujian Li, Jiwei Li, Tao Song, Wenjie Li, and Baobao Chang. 2013. A novel topic [22] Chong Wang and David M Blei. 2011. Collaborative topic modeling for recom- model for automatic term extraction. In ACM SIGIR. mending scientific articles. In ACM SIGKDD. [11] Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: [23] Tong Zhao, Julian McAuley, and Irwin King. 2015. Improving latent factor models understanding rating dimensions with review text. In ACM RecSys. via personalized feature projection for one class recommendation. In ACM CIKM. 8