Divide and Transfer: Understanding Latent Factors for
                           Recommendation Tasks
                 Vidyadhar Rao                                               Rosni K V∗                           Vineet Padmanabhan
            TCS Research Labs, India                            University of Hyderabad, India                 University of Hyderabad, India
             vidyadhar.rao@tcs.com                                   rosnikv@gmail.com                           vineetcs@uohyd.ernet.in

ABSTRACT
Traditionally, latent factor models have been the most successful
techniques to build recommendation systems. While the key is to
capture the user interests effectively, most research is focused on
learning latent factors under cold-start and data sparsity situations.
Our work brings a complementary approach to the previous studies
showing that understanding the semantic aspects of latent factors
could give a hint on how to transfer useful knowledge from auxiliary
domain(s) to the target domain.
   In this work, we propose a collaborative filtering technique that
can effectively utilize the user preferences and content information.
In our approach, we follow a divide and transfer strategy that could
derive semantically meaningful latent factors and utilize only the                   Figure 1: Illustration of cross domain scenario: Same users
appropriate components for recommendations. We demonstrate the                       might have different rating preferences for two genres in
effectiveness of our approach due to improved latent feature space                   movie recommendation system.
in both single and cross-domain tasks. Further, we also show its
robustness by performing extensive experiments under cold-start
                                                                                     movies, it may be easy to recommend upcoming new movies, but
and data sparsity contexts.
                                                                                     how do we recommend books that have similar plots. Typically, in
CCS CONCEPTS                                                                         single-domain user preferences from only one domain are used to
                                                                                     recommend items within the same domain, and in cross-domain
• Information systems → Recommender systems; Collabo-                                user preferences from auxiliary domain(s) are used to recommend
rative filtering; • Computing methodologies → Topic mod-                             items on another domain. Hence, producing meaningful recommen-
eling; Learning latent representations;                                              dations depends on how well the assumptions on the source domain
                                                                                     align with the operating environment. While the key is to transfer
1    INTRODUCTION                                                                    user interests from source to target domain, this problem has two
Most of the e-commerce businesses would want to help the cus-                        characteristics: (1) Cold-start problem i.e., shortage of information
tomers surf through items that might interest them. Some examples                    for new users or new items; and (2) Data sparsity problem i.e., users
include recommending books by Goodreads, products by Amazon,                         generally rate only a limited number of items.
movies by Netflix, music by Last.fm, news articles by Google, etc.                       Traditionally, in single-domain, the latent factor models [8] are
The most popular recommender systems follow two paradigms:                           used to transform the users and items into a common latent feature
Collaborative filtering (CF): utilize the preferences from a group of                space. Intuitively, users’ factors encode the ‘preferences’ while the
users and suggest items to other users; and Content-based filter-                    item factors encode the ‘properties’. However, the user and item
ing: recommend items that are similar to those that a user liked                     latent factors have no interpretable meaning in natural language.
in the past. In general, effective recommendations are obtained by                   Moreover, these techniques fail in the cross-domain scenarios be-
combining both content-based and collaborative features.                             cause the learned latent features may not align over different do-
   While many of these approaches are shown to be effective (in                      mains. Thus, understanding the semantic aspects of the latent factors
single domain), but in practice they have to operate in challeng-                    is highly desirable in cross domain research under cold-start and data
ing environment (in cross domain), and deliver more desirable                        sparsity contexts.
recommendations. For example, based on user’s watched list on                            In cross domain scenario, tensor factorization models [7] try to
                                                                                     represent the user-item-domain interaction with a tensor of order
∗ The work was conducted during an internship at TCS Research Labs, India.
                                                                                     three and factorize users, items and domains into latent feature
RecSysKTL Workshop @ ACM RecSys ’17, August 27, 2017, Como, Italy                    vectors. Essentially, these methods improve recommendations when
© 2017 Copyright is held by the author(s). .                                         the rating matrices from auxiliary domains share similar user-item
                                                                                     rating patterns. However, the user behavior in all domains may
                                                                                     not always be same and each user might have different domains of
                                                                                     interest (see Fig. 1). Moreover, when the auxiliary information from
                                                                                     multiple sources are combined, learned latent features may some
                                                                                     times degrade the quality of recommendations. (see Section 6.2.1)


                                                                                 1
  To address these shortcomings, we propose a method that can                  features and exploit the contextual information (such as text) in
derive semantically meaningful latent factors in a fully automatic             order to understand the meaning of the latent factors.
manner, and can hopefully improve the quality of recommendations.                 In our research, we built an algorithm inspired by technique1
The major contributions of this work are:                                      collaborative topic modeling [22] that can make recommendations
                                                                               by adjusting user preferences and content information. We com-
    (1) We hypothesize that the intent of the users are not signifi-           bine this model with the specific word with background model [3]
        cantly different with respect to the document-specific and             that can account for the semantic aspects of texts from multiple
        corpus-specific background information and thus, they can              domains. Thus, our model can transfer only the useful informa-
        be ignored when learning the latent factors.                           tion to the target domain by improving the latent feature space. To
    (2) We propose a collaborative filtering technique that segments           validate our hypothesis, we conducted experiments in both single
        the latent factors into semantic units and transfer only useful        and cross domain recommendations in extreme cold-start and data
        components to target domain. (see Section 4)                           sparsity scenarios, and reflect on the factors effecting the quality
    (3) We demonstrate the superiority of the proposed method                  of recommendations.
        in both single and cross-domain settings. Further, we show
        consistency of our approach due to improved latent features                           Table 1: Notations used in this paper
        under the cold-start and data sparsity contexts. (see Section 6)
                                                                                User node                                                               U
2    RELATED WORK                                                               Item node                                                               V
Among the latent factor models, matrix factorization (and it’s vari-            Rating node                                                             r
ants) [8] is the popular technique that tries to approximate an                 Number of users                                                         Nu
observed rating matrix to derive latent features. The basic principle           Regularization parameter for user                                       λu
is to find a common low-dimensional representation for both users               Regularization parameter for item                                       λv
and items i.e, reduce the rank of user-item matrix directly. Never-             Number of documents(items)                                              D
theless, the reduction approach addresses the sparsity problem by               Number of topics                                                        T
removing the unrepresentative or insignificant users or items. Dis-             Number of words in a document (d)                                       Nd
carding any useful information in this process may hinder further               nt h word in the document (d)                                           wdn
progress in this direction. To mitigate the cold start problem, their           Topic assignment for document (d) and word (wdn )                       zd ,zdn
variants [6] exploit user preferences or behavior from implicit feed-           Switch variable                                                         x
backs to improve personalized recommendations. Our work differs                 Prob. of topics given documents (T × D)                                 θ
from these as we do not completely remove the content information.              Prob. of switch variable given document (D × 3)                         λ
Instead, we assume that the content information about the items                 Prob. of words given special word distribution
could be captured from multiple semantic aspects, which are good                of document (d)                                                         ψd
at explaining the factors that contributed to the user preferences.             Prob. of words given topics (W × T )                                    ϕ
    Another popular latent factor model is one-class collaborative              Prob. of words given background distribution                            Ω
filtering [14] that tries to resolve the data sparsity problem by inter-        Dirichlet prior on document-topic distribution, θ (D × T )              α
preting the missing ratings as a mixture of negative examples and               Dirichlet prior on switch variable distributions                        γ
unlabeled positive examples. Essentially, their task is to distinguish
                                                                                Dirichlet prior on word-topic distribution (ϕ)                          β0
between user’s lack of interest in an item to user’s lack of awareness
                                                                                Dirichlet prior on special word distribution (ψ )                       β1
of the item. Alternately, others exploit the user generated informa-
                                                                                Dirichlet prior on background word distribution (Ω)                     β2
tion [9, 19] or create a shared knowledge from rating matrices in
multiple domains [4, 13, 15, 16]. However, they have limited utility
as users tend to show different ratings patterns across the domains.
In our work, we do not use the user generated content, and we
                                                                               3    PRELIMINARIES
only use the user preferences along with the content information
of items to learn the latent feature space.                                    Our method follows the same line as the collaborative topic re-
    While all these approaches try to mitigate the cold-start and              gression model (CTR) [22], in the sense that latent factors of the
data sparsity in the source domain, we focus on understanding                  content information are integrated with the user preferences. The
the semantics aspects of the learned latent factors. Many methods              main difference of our model with this approach is the way in
tried [11, 18, 21, 23] to adjust the latent factors according to the           which we derive meaningful topical latent factors from the con-
context. For example, in the job recommendation application, the               tent information and enable better predictions on recommendation
cross-domain segmented model [18] introduces user-based domains                tasks in general. For this we use the specific word with background
to derive indicator features and segment the users into different              model [3]. Before we describe our model, we give a brief review
domains. A projection based method [23] learns a projection ma-                of the existing latent factor models which serve as a basis for our
trix for each user that is able to capture the complexities of their           approach. The notations for graphical models are given in Table 1.
preferences towards certain items over others. Our work differs                1 https://open.blogs.nytimes.com/2015/08/11/building-the-next-new-york-times-
from these in the sense that we consider the domains based on item             recommendation-engine/


                                                                           2
 (a) Prob. Matrix Factorization (PMF) [12] (b) Latent Dirichlet Allocation (LDA) [20]        (c) Collaborative Topic Regression (CTRlda) [22]


      (d) Special Word with Background (SWB) [3]                        (e) Proposed Divide and Transfer Model (CTRswb)

Figure 2: Graphical representation of the latent topic models and recommendation models: (a) Probabilistic Matrix Factoriza-
tion (PMF), (b) Latent Dirichlet Allocation model (LDA), (c) Special words with background model (SWB), (d) Collaborative
Topic Regression (CTRlda), (e) Proposed divide and transfer model (CTRswb). The hidden nodes —the topic proportions, as-
signments, and topics —are unshaded. The observed nodes —the ratings and words of the documents —are shaded.


3.1     Probabilistic Matrix Factorization                                  3.3    Collaborative Topic Regression
Matrix factorization models the user-item preference matrix as              Collaborative topic regression (CTRlda) model [22] combines the
a product of two lower-rank user and item matrices. Given an                latent topics from LDA, and the user-item features from PMF to
observed matrix, the matrix factorization for collaborative filtering       jointly explain the observed content and user ratings, respectively.
can be generalized as a probabilistic model (PMF) [12], which scales        This model has two benefits over traditional approaches: (a) gen-
linearly with the number of observations. The graphical model for           eralization to unseen or new items (b) generate interpretable user
PMF is shown in Fig. 2a. In the figure, a user i is represented by          profiles. The graphic model for CTRlda model is shown in Fig. 2c.
a latent vector ui ∈ R K and an item j by a latent vector v j ∈ R K ,       More details about this model are given in section 4.
where K is the shared latent low-dimensional space.
                                                                            3.4    Special Word Topic Model
3.2     Latent Dirichlet Allocation                                         Special words with background model (SWB) [3] is based on LDA
Latent Dirichlet Allocation (LDA) [1, 20] is known to be power-             which models words in a document as either originating from
ful technique for discovering and exploiting the hidden thematic            general topics, or from document-specific word distributions, or
structure in large archives of text. The principle behind LDA is that       from a corpus-wide background distribution. To achieve this, the
documents exhibit multiple topics and, each topic can be viewed             SWB model introduces additional switch variables into the LDA
as a probability distribution over a fixed vocabulary. The richer           model to account for multiple word distributions. The SWB model
structure in the latent topic space allows to interpret documents           has similar general structure to the LDA model as shown in Fig. 2d.
in low-dimensional representations. Fig. 2b depicts the graphic             The main advantage of SWB model is that it can trade-off between
model for the LDA model where the latent factors word-topic (ϕ)             generality and specificity of documents in a fully probabilistic and
and the topic-document (θ ) are inferred from a given collection of         automated manner. An incremental version of this model exploits
documents.                                                                  this feature to build an automatic term extractor [10].


                                                                        3
4     DIVIDE AND TRANSFER LATENT TOPICS                                              4.2     Posterior Inference
Consider we have a set of I users, J items, and the rating variable                  The generative process is described in Algorithm 1 which combines
r i j ∈ {0, 1} that indicates if user i likes item j or not. For each                SWB [3] model (lines 1-14) and PMF [12] model (lines 15-26). We
user, we want to recommend items that are previously unseen and                      summarize the repeated sampling of word distributions for each
potentially interesting. Traditionally, the latent factor models try                 topic and user factors, and the predictions of user-item ratings.
to learn a common latent space of the users and items, given an
observed user-item preference matrix. Essentially, the recommen-                        4.2.1 Learning Parameters. Computing full posterior of the pa-
dation problem minimizes the regularized squared error loss with                     rameters ui , v j , θ j is intractable. Therefore, we adapt the EM-style
                     I and (v ) J ,
respect to the (ui )i=1
                                                                                     algorithm, as in CTRlda [22], to learn the maximum-a-posteriori
                                    j j=1
                                                                                     estimates. We refer the interested reader to CTRlda [22] for more
                      (r i j − uTi v j ) 2 + λu ||ui || 2 + λv ||v j || 2            details. It was shown that fixing θ j as the estimate gives comparable
                  X
              min                                                         (1)
                   i, j                                                              performance with vanilla LDA. We discovered that EM algorithm
                                                                                     convergence improved significantly when θ j from the SWB topic
where λu and λv are regularization parameters. Probabilistic matrix
                                                                                     route is used as initial estimate. (see Fig. 3b)
factorization (PMF) [12] solves this problem by drawing the ratings
for a given user-item pair from a Gaussian distribution given by                        4.2.2 Making Predictions. After learning (locally) all the param-
                              r iˆj ∼ N (uTi v j , c i j )                (2)        eters, subject to a convergence criteria, we can use the learned
                                                                                                            I , (v ) J , (θ ∗ ) J , (ϵ ∗ ) J in Eq. (3) for predic-
                                                                                     latent features (ui )i=1     j j=1     j j=1      j j=1
where c i j is a confidence parameter for r i j . In our work, we are                tions. Note that for new or unseen items, we do not have the offset
interested in jointly modeling the user preferences and the content                  value i.e., ϵ j = 0, hence the prediction completely relies on the topic
information to improve the quality of recommendations. We strict                     proportion derived from either latent models LDA or SWB model.
to the assumption that the content information from single/multiple
domain(s) and users share a common latent topic space. Our model                          4.2.3 Discussion. It is a common practice that the ratings of
builds on the CTRlda [22] model which can effectively balance the                    items are given on a scale, and the latent models try to predict
user preferences and the content information. This is achieved by                    the rating for a new user-item pair. In such cases, the factorization
including the latent variable ϵ j that offsets the topic proportion θ j              machines [17] are known to work for a variety of general prediction
i.e., v j = θ j + ϵ j , where the item latent vector v j is close to the topic       tasks like classification, regression, or ranking. In our setup, the
proportion θ j derived from LDA and could diverge from it if it has                  ratings are binary i.e., r i j ∈ {0, 1} where r i j = 0 can be interpreted
to. Here, the expectation of rating for a user-item pair is a simple                 in two ways: either ui is not interested in v j , or ui does not know about
linear function of θ j , i.e.,                                                       v j . In a way our goals differ from the prediction tasks considered in
                                                                                     factorization machines. Our study shows we can make predictions
                      E[r i j | ui , θ j , ϵ j ] = uTi (θ j + ϵ j )       (3)        to unseen items while deriving meaningful latent factors.
This explains how much of the prediction relies on content and                            While making predictions to unseen items, it is important to
how much it relies on how many users have rated an item. We                          see how effectively they can fuse the content information from
propose a straightforward extension of CTRlda that replaces the                      multiple sources. In our model, the semantic units are effective for
topic proportions derived from LDA [1] with multiple semantic                        representation of latent factors, and has advantages over CTRlda
proportions derived from SWB [3] over the common topic space.                        model. While the user preferences across the domains are very
                                                                                     different (see Fig. 1), the background word distributions are nearly
4.1     Graphical Model                                                              similar across all items, and therefore, its contribution towards v j
                                                                                     is not significant. Additionally, the specific words that occur in the
Our model is based on placing additional latent variables into
                                                                                     documents do not convey much information about the user prefer-
CTRlda model that can account for semantic aspects of the latent
                                                                                     ences. Hence, we can discard the Ω,ψ distributions and only use
factors. The graphical model of divide and transfer model, referred
                                                                                     the θ j derived from the general topic route of the SWB [3] model.
as ‘CTRswb’, is shown in Fig. 2e.
                                                                                     Subsequently, we demonstrate that CTRswb could learn better rep-
    4.1.1 Deriving Semantic Factors. [3] As can be seen in the figure,
                                                                                     resentations for the latent features compared to the CTRlda [22].
the latent variable x, associated with each word, acts as switch:
when x = 0, the word is generated via topic route; when x = 1,
it is generated via document-specific route; and for x = 2, it is                    5     EXPERIMENTS
generated via background route which is corpus specific. For x = 0                   We demonstrate the efficacy of our approach (CTRswb) in both
case, like LDA, words are sampled from document-topic (θ ) and                       single and cross domain scenarios on CiteULike dataset and Movie-
word-topic (ϕ) multinomials with α and β 0 as respective symmetric                   Lens dataset, respectively. For single domain, we adapt the same
Dirichlet priors. For x = 1 or x = 2, words are sampled from                         experiment settings as that of CTRlda [22]. Since, cross-domain
document-specific (ψ ) or corpus-specific (Ω) multinomials with β 1                  applications can be realized in multiple ways [2], we consider the
and β 2 as symmetric Dirichlet priors, respectively. The variable x                  shared user’s setup across multiple domains in two different con-
is sampled from a document-specific multinomial λ, which in turn                     texts: (1) recommendations in cold-start context, where we study
has a symmetric Dirichlet prior, γ . Since, the words are sampled                    the impact of number of topics in the auxiliary domain(s) and (2)
from mutiple topic routes, our model can automatically deduce the                    recommendations in data sparsity context, where we study the
latent features in a precise and meaningful manner.                                  impact of number of ratings in the auxiliary domain(s).


                                                                                 4
 Algorithm 1: Generative process for CTRswb model                                           to get item description for the movies. The basic statistics of the
                                                                                            dataset collected are reported in the Table 2.
 1 Select a background distribution over words Ω|β 2 ∼ Dir (β 2 )
 2 for each topic k ∈ 1, ....,T do                                                                      Table 2: MovieLens 1M Data statistics
 3     Select a word distribution ϕ k |β 0 ∼ Dir (β 0 )
 4 end
                                                                                              Movie Genre     No.Items     No.Users    No.Ratings     Rating Ratio
 5 for each document d ∈ 1, ...D do                                                           Drama            1,493        5,881        352,834         0.040
 6     Select a distribution over topics θd |α ∼ Dir (α )                                     Comedy           1,163        5,881        354,455         0.052
 7     Select a special-words distribution over words                                         Thriller          485         5,881        188,968         0.066
        ψd |β 1 ∼ Dir (β 1 )                                                                  Romance           459         5,881        146,916         0.054
 8     Select a distribution over switch variables λd |γ ∼ Beta(γ )                           Action            495         5,881        256,515         0.088
 9     for n = 1 : Nd words in document d do                                                  Total            4,095        5,881       1,299,688        0.054
10         Select a switch variable xdn |λd ∼ Mult (λd )
11         Select zdn |{θd , xdn } ∼
            Mult (θd ) δ (xd n ,1) δ (zdn , SW ) δ (xd n ,2) δ (zdn , BG) δ (xd n ,3)       5.3    Evaluation Methodology and Metrics
12         Generate a word: wdn |{zdn , xdn , ϕ,ψd , Ω} ∼                                   We evaluate the recommendation tasks by using the standard perfor-
            Mult (ϕ zd n ) δ (xd n ,1) Mult (ψd ) δ (xd n ,2) Mult (Ω) δ (xd n ,3)          mance metrics: Precision, Recall and Mean Average Precision(MAP).
13     end                                                                                  The results shown are averaged over all the users. In our studies,
14 end                                                                                      we set the parameters of PMF and CTRlda by referring to [22]. For
15 for user i ∈ 1...Nu do                                                                   PMF, λu = λv = 0.01, a = 1, b = 0.01. For CTRlda model, T = 200,
16     Draw ui ∼ N (0, λu−1 IT )                                                            λu = 0.01, λv = 100, a = 1, b = 0.01. For CTRswb model, we set
17 end                                                                                      α = 0.1, β 0 , β 2 = 0.01, β 1 = 0.0001, γ = 0.3 (all weak symmetric
18 for item j ∈ 1...D do
                                                                                            priors are set to default), T = 200, λu = 0.01, λv = 100, a = 1,
19     Draw ϵ j ∼ N (0, λv−1 IT )                                                           b = 0.01.
20     Compute v j = ϵ j + θ j
                                                                                            6 RESULTS AND DISCUSSION
21 end
22 for user i ∈ 1...Nu do                                                                   6.1 Study I: Single Domain recommendations
23     for item j ∈ 1...D do                                                                In this set of experiments, we compare the performance of the prob-
24         Draw ri j ∼ N (uTi v j , c i j )                                                 abilistic matrix factorization (PMF), CTR model [22] which make
25     end                                                                                  use of latent topics from LDA (CTRlda), and the proposed CTRswb
26 end                                                                                      model. Fig. 3a shows our results on CiteULike dataset under the
                                                                                            settings defined in [22]. In the graph, we also show how the topic
                                                                                            proportion from LDA and SWB alone (i.e, when the user rating
                                                                                            patterns from the train set are not considered) make predictions on
5.1     CiteULike Dataset                                                                   the test set for topK (from 20 to 300) recommendations.
We conducted experiments in single domain using dataset from                                   We can see that CTRswb consistently gives better recommenda-
the CiteULike 2 , a free service social network for scholars which                          tions than other factor models for different number of recommen-
allows users to organize (personal libraries) and share papers they                         dations. Moreover, the margin of improvement for smaller number
are reading. We use the metadata of CiteULike from [22] collected                           of recommendations is large between the CTRswb and CTRlda
during 2004 and 2010. The dataset contains 204, 986 pairs of ob-                            methods. Clearly, the PMF model lacks the content information and
served ratings with 5551 users and 16, 980 articles. Each user has                          the pure content based models do not utilize user preferences and
37 articles in their library on an average and only 7% of the users                         therefore, under-perform w.r.t CTR based models.
has more than 100 articles. That is, the density of dataset is quite                           Further, we also show the performance of CTR based methods
low: 0.2175%. Item or article is represented by it’s title and abstract.                    when subjected to iterative optimization of the parameter θ j . We
After pre processing the corpus, 8000 unique words are generated                            observe that the CTRswb model has a faster convergence compared
as vocabulary.                                                                              to CTRlda model as plotted in Fig. 3b. Clearly, the error gap analy-
                                                                                            sis shows that the latent topics transferred from SWB model are
5.2     MovieLens Dataset                                                                   in agreement with the consistent performance improvement of
To conduct the experiments in cross-domain, we have used the                                CTRswb methods over the CTRlda.
dataset provided by Grouplens [5]. We extracted five genres with                               In Fig. 3c, we show the performance of CTR based methods both
most ratings out of the 19 genres from the 1 million movielens                              with and without θ j optimization. The reason for CTRswb method
dataset: Action, Comedy, Drama, Romance, Thriller. Since the movie-                         giving the best performance, in both cases, is that in the real world
lens dataset has only user generated tags, we crawled the IMDB 3                            item descriptions there will be lot of item specific terms, which will
                                                                                            not be that much helpful for the recommendations. By removing
2 http://www.citeulike.org/                                                                 the background terms of the corpus and specific terms from each
3 http://www.imdb.com
                                                                                            items, we could aggregate the θ j value in a precise manner.


                                                                                        5
      (a) Recall measure                               (b) Convergence curve                                 (c) θ optimization

Figure 3: CiteULike Dataset in single domain recommendations: (a) Comparison for different recommendation algorithms
in terms of recall measure. (b) Convergence curve of CTRlda and CTRswb w.r.t no. of iterations during θ optimization. (c)
Performance of CTR based methods w.r.t θ optimization in terms of recall measure. (Best viewed in color)


6.2     Study II: Cross Domain recommendations                              Table 3: MovieLens Dataset: Comparison of different recom-
                                                                            mendation algorithms in terms of MAP, Precision (P), Recall
In the cross-domain settings, we consider every genre in the dataset
                                                                            (R). Here, we show the performance with 80 latent factors for
as a target domain while the other domains are treated as its auxil-
                                                                            all the five cold start scenarios. Bold numbers indicate best
iary domains. For example, if “Action” genre is the target domain,
                                                                            performance for a given target domain.
the other four genres will constitute as the source domains.
   6.2.1 Cold-start scenario and the impact of number of topics: In
                                                                             Genre       Method    MAP@20       P@10    P@20      R@10    R@20
this study, we consider the scenario when zero-rating information
from the target domain while learning the latent topic features.             Action      PMF         0.133      0.072   0.069     0.013   0.024
From Table 2, we pick one of the genres as the target domain and                         LDA         0.057      0.025   0.026     0.005    0.01
create five cold-start scenarios (one for each genre in the dataset).                    SWB         0.244      0.136    0.11     0.035   0.052
We have run the algorithms PMF, LDA, SWB, CTRlda, CTRswb for                            CTRlda       0.099      0.061   0.057     0.013   0.025
each of the cold-start situations.                                                      CTRswb       0.306      0.176    0.14     0.051    0.07
   Figs. 4a–4e show mean average precision for top20 recommen-               Comedy      PMF         0.101       0.05   0.049     0.008   0.014
dations for five target genres. We can see that the MAP score of                         LDA         0.073      0.024   0.027     0.004   0.009
PMF model did not improve much when the number of latent fac-                            SWB         0.122      0.059    0.05     0.009   0.016
tors are increased. Notice that, in many cases, the CTRlda method                       CTRlda       0.059      0.029   0.026     0.007   0.012
degrades the quality of recommendations when compared to tradi-                         CTRswb       0.147      0.074   0.061     0.011   0.018
tional PMF. Moreover, the CTRlda is highly sensitive to the number of        Drama       PMF          0.09      0.039   0.038     0.006   0.012
latent factors and we noticed it consistently perform worse than the                     LDA         0.075      0.027   0.027     0.004   0.009
CTRswb. This could be reasoned as one of the potential problems with                     SWB          0.1       0.044   0.041     0.009   0.016
the learned topics that are obtained by feature fusion from multiple                    CTRlda       0.024      0.011   0.013     0.001   0.004
domains. The CTRswb approach explicitly models these aspects                            CTRswb       0.235       0.07   0.055      0.02   0.026
and provides ability to improve the latent features. As we can see           Romance     PMF         0.099      0.048   0.046     0.015   0.028
in the picture, our model consistently produces better quality of                        LDA         0.038      0.012   0.014     0.004   0.009
recommendations for different number of latent factors.                                  SWB         0.094      0.029   0.025     0.024   0.037
   Fig. 4f shows the performance when averaged over all genres.                         CTRlda       0.056      0.036   0.024     0.022   0.027
From the plot, we observed that using 80 latent factors showed best                     CTRswb       0.367      0.084    0.06     0.061    0.07
performance for all genres except for comedy genre. The devia-               Thriller    PMF         0.127      0.063    0.06     0.016   0.029
tion in the case of “comedy” genre is expected as the number of                          LDA         0.076      0.035   0.028     0.012   0.018
items in the source domains are relatively less. Table 3 shows the                       SWB         0.079      0.041    0.04      0.01    0.02
performance of the different recommendations algorithms when                            CTRlda       0.084      0.038   0.031     0.016   0.027
80 latent topics are used. Clearly, the proposed CTRswb model                           CTRswb       0.162       0.09   0.073     0.022   0.034
significantly improves over CTRlda and other methods in all the
cold-start scenarios.


                                                                        6
          (a) Action                                          (b) Comedy                                             (c) Drama


        (d) Romance                                           (e) Thriller                                     (f) Mean of all Genres

Figure 4: Movielens Dataset in cross-domain recommendations: Impact of different sizes of latent topic space on quality of
recommendations. Here, we use one of the genre as target domain, and remaining four as source domain. (Best viewed in color)


   6.2.2 Data sparsity scenario and the impact of number of ratings:         better performance, by large margin, than these methods. Over all,
In this study, to explore the behavior of cross-domain recommen-             the results show that the latent factors of CTRswb are very reli-
dation, we examined the latent topic space under data sparsity               able and could improve the recommendations even under extreme
scenario. We use the same movielens data as in Table 2 and create            sparse data scenarios.
10 data sparsity situations by incrementally removing (random)
10% of the ratings from the source genres. Throughout, similar to            7   CONCLUSIONS
study in cold-start context, we do not use ratings from the target           We have proposed an approach to validate our hypothesis that the
genre. To validate our findings, we have shown the evaluations               quality of recommendations can be improved by explicitly utiliz-
only for the topic space of 80 latent factors. Figs. 5a–5e shows mean        ing the general topic word distributions while learning the latent
average precision of top20 recommendations for different degrees             features. Our approach recommends items to users based on both
of sparsity (rating ratio) in the source domain.                             content and user preferences, and could at best exploit the content
   The effect of number of ratings is much clear and straightfor-            information in both single and cross-domain scenarios. Our results
ward compared to the effects of number of latent factors. The results        on single-domain show the superiority over pure latent factor and
reveal that the number of ratings in source genres have a significant        CTRlda models, and results on the cross-domain demonstrate its
impact on the accuracy. However, the scale of the impact is very             robustness under cold-start and data sparsity situations.
different on each target domain as number of ratings in some genres             In the future, we plan to explore cross-domain recommendation
are less. From the plots, it shows that the more user preferences are        scenarios in heterogeneous settings (e.g movies to books). In addi-
available in auxiliary domains, the better the accuracy of recom-            tion to this, we have used a simple collaborative filtering approach
mendations on target domain. When the number of ratings have                 with zero-rating information from target domain, we believe utiliz-
increased, the PMF, LDA, SWB and CTRlda have shown moderate                  ing the target domain ratings could result in better cross-domain
improvements in terms of MAP. Our approach consistently shows                recommendations.


                                                                        7
            (a) Action                                                         (b) Comedy                                                         (c) Drama


          (d) Romance                                                          (e) Thriller                                               (f) Mean of all Genres

Figure 5: Movielens Dataset in cross-domain recommendations: Impact of the quality of recommendations for different
amounts of ratings from the source domains. Here, we show the stability of CTRswb method when subjected to both cold-
start (w.r.t target domain) and data sparsity (w.r.t source domains) scenario. (Best viewed in color)


REFERENCES                                                                                     [12] Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic matrix factorization.
 [1] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation.            In NIPS.
     Journal of machine Learning research (2003).                                              [13] Orly Moreno, Bracha Shapira, Lior Rokach, and Guy Shani. 2012. Talmud: trans-
 [2] Iván Cantador, Ignacio Fernández-Tobías, Shlomo Berkovsky, and Paolo Cre-                      fer learning for multiple domains. In Proceedings of the 21st ACM international
     monesi. 2015. Cross-domain recommender systems. In Recommender Systems                         conference on Information and knowledge management.
     Handbook. Springer.                                                                       [14] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz,
 [3] Chaitanya Chemudugunta, Padhraic Smyth, and Mark Steyvers. 2007. Modeling                      and Qiang Yang. 2008. One-class collaborative filtering. In ICDM.
     general and specific aspects of documents with a probabilistic topic model. In            [15] Weike Pan, Nathan N Liu, Evan W Xiang, and Qiang Yang. 2011. Transfer learning
     NIPS.                                                                                          to predict missing ratings via heterogeneous user feedbacks. In IJCAI.
 [4] Wei Chen, Wynne Hsu, and Mong Li Lee. 2013. Making recommendations from                   [16] Weike Pan, Evan Wei Xiang, Nathan Nan Liu, and Qiang Yang. 2010. Transfer
     multiple domains. In ACM SIGKDD.                                                               Learning in Collaborative Filtering for Sparsity Reduction. In AAAI.
 [5] F Maxwell Harper and Joseph A Konstan. 2016. The movielens datasets: History              [17] Steffen Rendle. 2010. Factorization machines. In ICDM.
     and context. ACM Transactions on Interactive Intelligent Systems (TiiS) (2016).           [18] Shaghayegh Sahebi and Trevor Walker. 2014. Content-Based Cross-Domain
 [6] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for                  Recommendations Using Segmented Models. CBRecSys (2014).
     implicit feedback datasets. In ICDM.                                                      [19] Yue Shi, Martha Larson, and Alan Hanjalic. 2011. Tags as bridges between
 [7] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver.                   domains: Improving recommendation with tag-induced cross-domain collabo-
     2010. Multiverse recommendation: n-dimensional tensor factorization for context-               rative filtering. In International Conference on User Modeling, Adaptation, and
     aware collaborative filtering. In ACM REcSys.                                                  Personalization. Springer.
 [8] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech-           [20] Mark Steyvers and Tom Griffiths. 2007. Probabilistic topic models. Handbook of
     niques for recommender systems. Computer (2009).                                               latent semantic analysis (2007).
 [9] Bin Li, Qiang Yang, and Xiangyang Xue. 2009. Transfer learning for collaborative          [21] Fatemeh Vahedian and Robin D Burke. Predicting Component Utilities for Linear-
     filtering via a rating-matrix generative model. In ICML. ACM.                                  Weighted Hybrid Recommendation.
[10] Sujian Li, Jiwei Li, Tao Song, Wenjie Li, and Baobao Chang. 2013. A novel topic           [22] Chong Wang and David M Blei. 2011. Collaborative topic modeling for recom-
     model for automatic term extraction. In ACM SIGIR.                                             mending scientific articles. In ACM SIGKDD.
[11] Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics:                 [23] Tong Zhao, Julian McAuley, and Irwin King. 2015. Improving latent factor models
     understanding rating dimensions with review text. In ACM RecSys.                               via personalized feature projection for one class recommendation. In ACM CIKM.


                                                                                           8