<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Divide and Transfer: Understanding Latent Factors for Recommendation Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vidyadhar Rao</string-name>
          <email>vidyadhar.rao@tcs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rosni K V∗</string-name>
          <email>rosnikv@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vineet Padmanabhan</string-name>
          <email>vineetcs@uohyd.ernet.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TCS Research Labs</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Hyderabad</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>27</volume>
      <issue>2017</issue>
      <abstract>
        <p>Traditionally, latent factor models have been the most successful techniques to build recommendation systems. While the key is to capture the user interests efectively, most research is focused on learning latent factors under cold-start and data sparsity situations. Our work brings a complementary approach to the previous studies showing that understanding the semantic aspects of latent factors could give a hint on how to transfer useful knowledge from auxiliary domain(s) to the target domain. In this work, we propose a collaborative filtering technique that can efectiv ely utilize the user preferences and content information. In our approach, we follow a divide and transfer strategy that could derive semantically meaningful latent factors and utilize only the appropriate components for recommendations. We demonstrate the efectiveness of our approach due to improved latent feature space in both single and cross-domain tasks. Further, we also show its robustness by performing extensive experiments under cold-start and data sparsity contexts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Recommender systems;
Collaborative filtering ; • Computing methodologies → Topic
modeling; Learning latent representations;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>Most of the e-commerce businesses would want to help the
customers surf through items that might interest them. Some examples
include recommending books by Goodreads, products by Amazon,
movies by Netflix, music by Last.fm, news articles by Google, etc.
The most popular recommender systems follow two paradigms:
Collaborative filtering (CF): utilize the preferences from a group of
users and suggest items to other users; and Content-based
filtering: recommend items that are similar to those that a user liked
in the past. In general, efective recommendations are obtained by
combining both content-based and collaborative features.</p>
      <p>While many of these approaches are shown to be efective (in
single domain), but in practice they have to operate in
challenging environment (in cross domain), and deliver more desirable
recommendations. For example, based on user’s watched list on
∗The work was conducted during an internship at TCS Research Labs, India.
movies, it may be easy to recommend upcoming new movies, but
how do we recommend books that have similar plots. Typically, in
single-domain user preferences from only one domain are used to
recommend items within the same domain, and in cross-domain
user preferences from auxiliary domain(s) are used to recommend
items on another domain. Hence, producing meaningful
recommendations depends on how well the assumptions on the source domain
align with the operating environment. While the key is to transfer
user interests from source to target domain, this problem has two
characteristics: (1) Cold-start problem i.e., shortage of information
for new users or new items; and (2) Data sparsity problem i.e., users
generally rate only a limited number of items.</p>
      <p>
        Traditionally, in single-domain, the latent factor models [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are
used to transform the users and items into a common latent feature
space. Intuitively, users’ factors encode the ‘preferences’ while the
item factors encode the ‘properties’. However, the user and item
latent factors have no interpretable meaning in natural language.
Moreover, these techniques fail in the cross-domain scenarios
because the learned latent features may not align over diferent
domains. Thus, understanding the semantic aspects of the latent factors
is highly desirable in cross domain research under cold-start and data
sparsity contexts.
      </p>
      <p>
        In cross domain scenario, tensor factorization models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] try to
represent the user-item-domain interaction with a tensor of order
three and factorize users, items and domains into latent feature
vectors. Essentially, these methods improve recommendations when
the rating matrices from auxiliary domains share similar user-item
rating patterns. However, the user behavior in all domains may
not always be same and each user might have diferent domains of
interest (see Fig. 1). Moreover, when the auxiliary information from
multiple sources are combined, learned latent features may some
times degrade the quality of recommendations. (see Section 6.2.1)
To address these shortcomings, we propose a method that can
derive semantically meaningful latent factors in a fully automatic
manner, and can hopefully improve the quality of recommendations.
The major contributions of this work are:
(1) We hypothesize that the intent of the users are not
significantly diferent with respect to the document-specific and
corpus-specific background information and thus, they can
be ignored when learning the latent factors.
(2) We propose a collaborative filtering technique that segments
the latent factors into semantic units and transfer only useful
components to target domain. (see Section 4)
(3) We demonstrate the superiority of the proposed method
in both single and cross-domain settings. Further, we show
consistency of our approach due to improved latent features
under the cold-start and data sparsity contexts. (see Section 6)
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Among the latent factor models, matrix factorization (and it’s
variants) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is the popular technique that tries to approximate an
observed rating matrix to derive latent features. The basic principle
is to find a common low-dimensional representation for both users
and items i.e, reduce the rank of user-item matrix directly.
Nevertheless, the reduction approach addresses the sparsity problem by
removing the unrepresentative or insignificant users or items.
Discarding any useful information in this process may hinder further
progress in this direction. To mitigate the cold start problem, their
variants [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] exploit user preferences or behavior from implicit
feedbacks to improve personalized recommendations. Our work difers
from these as we do not completely remove the content information.
Instead, we assume that the content information about the items
could be captured from multiple semantic aspects, which are good
at explaining the factors that contributed to the user preferences.
      </p>
      <p>
        Another popular latent factor model is one-class collaborative
ifltering [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] that tries to resolve the data sparsity problem by
interpreting the missing ratings as a mixture of negative examples and
unlabeled positive examples. Essentially, their task is to distinguish
between user’s lack of interest in an item to user’s lack of awareness
of the item. Alternately, others exploit the user generated
information [
        <xref ref-type="bibr" rid="ref19 ref9">9, 19</xref>
        ] or create a shared knowledge from rating matrices in
multiple domains [
        <xref ref-type="bibr" rid="ref13 ref15 ref16 ref4">4, 13, 15, 16</xref>
        ]. However, they have limited utility
as users tend to show diferent ratings patterns across the domains.
      </p>
      <p>In our work, we do not use the user generated content, and we
only use the user preferences along with the content information
of items to learn the latent feature space.</p>
      <p>
        While all these approaches try to mitigate the cold-start and
data sparsity in the source domain, we focus on understanding
the semantics aspects of the learned latent factors. Many methods
tried [
        <xref ref-type="bibr" rid="ref11 ref18 ref21 ref23">11, 18, 21, 23</xref>
        ] to adjust the latent factors according to the
context. For example, in the job recommendation application, the
cross-domain segmented model [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] introduces user-based domains
to derive indicator features and segment the users into diferent
domains. A projection based method [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] learns a projection
matrix for each user that is able to capture the complexities of their
preferences towards certain items over others. Our work difers
from these in the sense that we consider the domains based on item
features and exploit the contextual information (such as text) in
order to understand the meaning of the latent factors.
      </p>
      <p>
        In our research, we built an algorithm inspired by technique1
collaborative topic modeling [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] that can make recommendations
by adjusting user preferences and content information. We
combine this model with the specific word with background model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
that can account for the semantic aspects of texts from multiple
domains. Thus, our model can transfer only the useful
information to the target domain by improving the latent feature space. To
validate our hypothesis, we conducted experiments in both single
and cross domain recommendations in extreme cold-start and data
sparsity scenarios, and reflect on the factors efecting the quality
of recommendations.
Our method follows the same line as the collaborative topic
regression model (CTR) [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], in the sense that latent factors of the
content information are integrated with the user preferences. The
main diference of our model with this approach is the way in
which we derive meaningful topical latent factors from the
content information and enable better predictions on recommendation
tasks in general. For this we use the specific word with background
model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Before we describe our model, we give a brief review
of the existing latent factor models which serve as a basis for our
approach. The notations for graphical models are given in Table 1.
1https://open.blogs.nytimes.com/2015/08/11/building-the-next-new-york-timesrecommendation-engine/
(a) Prob. Matrix Factorization (PMF) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] (b) Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
(c) Collaborative Topic Regression (CTRlda) [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
(d) Special Word with Background (SWB) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
(e) Proposed Divide and Transfer Model (CTRswb)
Matrix factorization models the user-item preference matrix as
a product of two lower-rank user and item matrices. Given an
observed matrix, the matrix factorization for collaborative filtering
can be generalized as a probabilistic model (PMF) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which scales
linearly with the number of observations. The graphical model for
PMF is shown in Fig. 2a. In the figure, a user i is represented by
a latent vector ui ∈ RK and an item j by a latent vector vj ∈ RK ,
where K is the shared latent low-dimensional space.
3.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Latent Dirichlet Allocation</title>
      <p>
        Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref1 ref20">1, 20</xref>
        ] is known to be
powerful technique for discovering and exploiting the hidden thematic
structure in large archives of text. The principle behind LDA is that
documents exhibit multiple topics and, each topic can be viewed
as a probability distribution over a fixed vocabulary. The richer
structure in the latent topic space allows to interpret documents
in low-dimensional representations. Fig. 2b depicts the graphic
model for the LDA model where the latent factors word-topic (ϕ)
and the topic-document (θ ) are inferred from a given collection of
documents.
3.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Collaborative Topic Regression</title>
      <p>
        Collaborative topic regression (CTRlda) model [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] combines the
latent topics from LDA, and the user-item features from PMF to
jointly explain the observed content and user ratings, respectively.
This model has two benefits over traditional approaches: (a)
generalization to unseen or new items (b) generate interpretable user
profiles . The graphic model for CTRlda model is shown in Fig. 2c.
More details about this model are given in section 4.
3.4
      </p>
    </sec>
    <sec id="sec-6">
      <title>Special Word Topic Model</title>
      <p>
        Special words with background model (SWB) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is based on LDA
which models words in a document as either originating from
general topics, or from document-specific word distributions, or
from a corpus-wide background distribution. To achieve this, the
SWB model introduces additional switch variables into the LDA
model to account for multiple word distributions. The SWB model
has similar general structure to the LDA model as shown in Fig. 2d.
The main advantage of SWB model is that it can trade-of between
generality and specificity of documents in a fully probabilistic and
automated manner. An incremental version of this model exploits
this feature to build an automatic term extractor [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>DIVIDE AND TRANSFER LATENT TOPICS</title>
      <p>Consider we have a set of I users, J items, and the rating variable
ri j ∈ {0, 1} that indicates if user i likes item j or not. For each
user, we want to recommend items that are previously unseen and
potentially interesting. Traditionally, the latent factor models try
to learn a common latent space of the users and items, given an
observed user-item preference matrix. Essentially, the
recommendation problem minimizes the regularized squared error loss with
respect to the (ui )iI=1 and (vj )jJ=1,
min X(ri j − uTi vj )2 + λu ||ui ||2 + λv ||vj ||2</p>
      <p>
        i, j
where λu and λv are regularization parameters. Probabilistic matrix
factorization (PMF) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] solves this problem by drawing the ratings
for a given user-item pair from a Gaussian distribution given by
riˆj ∼ N (uTi vj , ci j )
where ci j is a confidence parameter for ri j . In our work, we are
interested in jointly modeling the user preferences and the content
information to improve the quality of recommendations. We strict
to the assumption that the content information from single/multiple
domain(s) and users share a common latent topic space. Our model
builds on the CTRlda [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] model which can efectively balance the
user preferences and the content information. This is achieved by
including the latent variable ϵj that ofsets the topic proportion θj
i.e., vj = θj + ϵj , where the item latent vector vj is close to the topic
proportion θj derived from LDA and could diverge from it if it has
to. Here, the expectation of rating for a user-item pair is a simple
linear function of θj , i.e.,
      </p>
      <p>
        E[ri j | ui , θj , ϵj ] = uTi (θj + ϵj )
This explains how much of the prediction relies on content and
how much it relies on how many users have rated an item. We
propose a straightforward extension of CTRlda that replaces the
topic proportions derived from LDA [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] with multiple semantic
proportions derived from SWB [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] over the common topic space.
4.1
      </p>
    </sec>
    <sec id="sec-8">
      <title>Graphical Model</title>
      <p>Our model is based on placing additional latent variables into
CTRlda model that can account for semantic aspects of the latent
factors. The graphical model of divide and transfer model, referred
as ‘CTRswb’, is shown in Fig. 2e.</p>
      <p>
        4.1.1 Deriving Semantic Factors. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] As can be seen in the figure,
the latent variable x , associated with each word, acts as switch:
when x = 0, the word is generated via topic route; when x = 1,
it is generated via document-specific route; and for x = 2, it is
generated via background route which is corpus specific. For x = 0
case, like LDA, words are sampled from document-topic (θ ) and
word-topic (ϕ) multinomials with α and β0 as respective symmetric
Dirichlet priors. For x = 1 or x = 2, words are sampled from
document-specific ( ψ ) or corpus-specific ( Ω ) multinomials with β1
and β2 as symmetric Dirichlet priors, respectively. The variable x
is sampled from a document-specific multinomial λ, which in turn
has a symmetric Dirichlet prior, γ . Since, the words are sampled
from mutiple topic routes, our model can automatically deduce the
latent features in a precise and meaningful manner.
(1)
(2)
(3)
4.2
      </p>
    </sec>
    <sec id="sec-9">
      <title>Posterior Inference</title>
      <p>
        The generative process is described in Algorithm 1 which combines
SWB [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] model (lines 1-14) and PMF [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] model (lines 15-26). We
summarize the repeated sampling of word distributions for each
topic and user factors, and the predictions of user-item ratings.
      </p>
      <p>
        4.2.1 Learning Parameters. Computing full posterior of the
parameters ui , vj , θj is intractable. Therefore, we adapt the EM-style
algorithm, as in CTRlda [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], to learn the maximum-a-posteriori
estimates. We refer the interested reader to CTRlda [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] for more
details. It was shown that fixing θj as the estimate gives comparable
performance with vanilla LDA. We discovered that EM algorithm
convergence improved significantly when θj from the SWB topic
route is used as initial estimate. (see Fig. 3b)
      </p>
      <p>4.2.2 Making Predictions. After learning (locally) all the
parameters, subject to a convergence criteria, we can use the learned
latent features (ui )iI=1, (vj )jJ=1, (θj∗)jJ=1, (ϵj∗)jJ=1 in Eq. (3) for
predictions. Note that for new or unseen items, we do not have the ofset
value i.e., ϵj = 0, hence the prediction completely relies on the topic
proportion derived from either latent models LDA or SWB model.</p>
      <p>
        4.2.3 Discussion. It is a common practice that the ratings of
items are given on a scale, and the latent models try to predict
the rating for a new user-item pair. In such cases, the factorization
machines [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] are known to work for a variety of general prediction
tasks like classification, regression, or ranking. In our setup, the
ratings are binary i.e., ri j ∈ {0, 1} where ri j = 0 can be interpreted
in two ways: either ui is not interested in vj , or ui does not know about
vj . In a way our goals difer from the prediction tasks considered in
factorization machines. Our study shows we can make predictions
to unseen items while deriving meaningful latent factors.
      </p>
      <p>
        While making predictions to unseen items, it is important to
see how efectively they can fuse the content information from
multiple sources. In our model, the semantic units are efective for
representation of latent factors, and has advantages over CTRlda
model. While the user preferences across the domains are very
diferent (see Fig. 1), the background word distributions are nearly
similar across all items, and therefore, its contribution towards vj
is not significant. Additionally, the specific words that occur in the
documents do not convey much information about the user
preferences. Hence, we can discard the Ω , ψ distributions and only use
the θj derived from the general topic route of the SWB [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] model.
Subsequently, we demonstrate that CTRswb could learn better
representations for the latent features compared to the CTRlda [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-10">
      <title>EXPERIMENTS</title>
      <p>
        We demonstrate the eficacy of our approach (CTRswb) in both
single and cross domain scenarios on CiteULike dataset and
MovieLens dataset, respectively. For single domain, we adapt the same
experiment settings as that of CTRlda [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Since, cross-domain
applications can be realized in multiple ways [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we consider the
shared user’s setup across multiple domains in two diferent
contexts: (1) recommendations in cold-start context, where we study
the impact of number of topics in the auxiliary domain(s) and (2)
recommendations in data sparsity context, where we study the
impact of number of ratings in the auxiliary domain(s).
9
10
11
12
      </p>
      <p>Algorithm 1: Generative process for CTRswb model
3
4 end
1 Select a background distribution over words Ω |β2 ∼ Dir (β2)
2 for each topic k ∈ 1, ...., T do</p>
      <p>Select a word distribution ϕk |β0 ∼ Dir (β0)
5 for each document d ∈ 1, ...D do
6 Select a distribution over topics θd |α ∼ Dir (α )
7 Select a special-words distribution over words</p>
      <p>ψd |β1 ∼ Dir (β1)
8 Select a distribution over switch variables λd |γ ∼ Beta(γ )
for n = 1 : Nd words in document d do</p>
      <p>Select a switch variable xdn |λd ∼ Mult (λd )</p>
      <sec id="sec-10-1">
        <title>Select zdn |{θd , xdn } ∼</title>
        <p>Mult (θd )δ (xdn,1)δ (zdn , SW )δ (xdn,2)δ (zdn , BG )δ (xdn,3)</p>
      </sec>
      <sec id="sec-10-2">
        <title>Generate a word: wdn |{zdn , xdn , ϕ, ψd , Ω } ∼</title>
        <p>Mult (ϕzdn )δ (xdn,1)Mult (ψd )δ (xdn,2)Mult (Ω )δ (xdn,3)
13
14 end</p>
        <p>end
20
21 end
15 for user i ∈ 1...Nu do
16 Draw ui ∼ N (0, λu−1 IT )
17 end
18 for item j ∈ 1...D do
19 Draw ϵj ∼ N (0, λv−1 IT )</p>
        <p>Compute vj = ϵj + θj
22 for user i ∈ 1...Nu do
23 for item j ∈ 1...D do
24 Draw ri j ∼ N (uTi vj , ci j )
25
26 end</p>
        <p>end
5.1</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>CiteULike Dataset</title>
      <p>
        We conducted experiments in single domain using dataset from
the CiteULike 2, a free service social network for scholars which
allows users to organize (personal libraries) and share papers they
are reading. We use the metadata of CiteULike from [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] collected
during 2004 and 2010. The dataset contains 204, 986 pairs of
observed ratings with 5551 users and 16, 980 articles. Each user has
37 articles in their library on an average and only 7% of the users
has more than 100 articles. That is, the density of dataset is quite
low: 0.2175%. Item or article is represented by it’s title and abstract.
After pre processing the corpus, 8000 unique words are generated
as vocabulary.
5.2
      </p>
    </sec>
    <sec id="sec-12">
      <title>MovieLens Dataset</title>
      <p>
        To conduct the experiments in cross-domain, we have used the
dataset provided by Grouplens [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We extracted five genres with
most ratings out of the 19 genres from the 1 million movielens
dataset: Action, Comedy, Drama, Romance, Thriller. Since the
movielens dataset has only user generated tags, we crawled the IMDB 3
      </p>
      <sec id="sec-12-1">
        <title>2http://www.citeulike.org/ 3http://www.imdb.com</title>
        <p>
          to get item description for the movies. The basic statistics of the
dataset collected are reported in the Table 2.
We evaluate the recommendation tasks by using the standard
performance metrics: Precision, Recall and Mean Average Precision(MAP).
The results shown are averaged over all the users. In our studies,
we set the parameters of PMF and CTRlda by referring to [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. For
PMF, λu = λv = 0.01, a = 1, b = 0.01. For CTRlda model, T = 200,
λu = 0.01, λv = 100, a = 1, b = 0.01. For CTRswb model, we set
α = 0.1, β0, β2 = 0.01, β1 = 0.0001, γ = 0.3 (all weak symmetric
priors are set to default), T = 200, λu = 0.01, λv = 100, a = 1,
b = 0.01.
In this set of experiments, we compare the performance of the
probabilistic matrix factorization (PMF), CTR model [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] which make
use of latent topics from LDA (CTRlda), and the proposed CTRswb
model. Fig. 3a shows our results on CiteULike dataset under the
settings defined in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. In the graph, we also show how the topic
proportion from LDA and SWB alone (i.e, when the user rating
patterns from the train set are not considered) make predictions on
the test set for topK (from 20 to 300) recommendations.
        </p>
        <p>We can see that CTRswb consistently gives better
recommendations than other factor models for diferent number of
recommendations. Moreover, the margin of improvement for smaller number
of recommendations is large between the CTRswb and CTRlda
methods. Clearly, the PMF model lacks the content information and
the pure content based models do not utilize user preferences and
therefore, under-perform w.r.t CTR based models.</p>
        <p>Further, we also show the performance of CTR based methods
when subjected to iterative optimization of the parameter θj . We
observe that the CTRswb model has a faster convergence compared
to CTRlda model as plotted in Fig. 3b. Clearly, the error gap
analysis shows that the latent topics transferred from SWB model are
in agreement with the consistent performance improvement of
CTRswb methods over the CTRlda.</p>
        <p>In Fig. 3c, we show the performance of CTR based methods both
with and without θj optimization. The reason for CTRswb method
giving the best performance, in both cases, is that in the real world
item descriptions there will be lot of item specific terms, which will
not be that much helpful for the recommendations. By removing
the background terms of the corpus and specific terms from each
items, we could aggregate the θj value in a precise manner.
In the cross-domain settings, we consider every genre in the dataset
as a target domain while the other domains are treated as its
auxiliary domains. For example, if “Action” genre is the target domain,
the other four genres will constitute as the source domains.</p>
        <p>6.2.1 Cold-start scenario and the impact of number of topics: In
this study, we consider the scenario when zero-rating information
from the target domain while learning the latent topic features.
From Table 2, we pick one of the genres as the target domain and
create five cold-start scenarios (one for each genre in the dataset).
We have run the algorithms PMF, LDA, SWB, CTRlda, CTRswb for
each of the cold-start situations.</p>
        <p>Figs. 4a–4e show mean average precision for top20
recommendations for five target genres. We can see that the MAP score of
PMF model did not improve much when the number of latent
factors are increased. Notice that, in many cases, the CTRlda method
degrades the quality of recommendations when compared to
traditional PMF. Moreover, the CTRlda is highly sensitive to the number of
latent factors and we noticed it consistently perform worse than the
CTRswb. This could be reasoned as one of the potential problems with
the learned topics that are obtained by feature fusion from multiple
domains. The CTRswb approach explicitly models these aspects
and provides ability to improve the latent features. As we can see
in the picture, our model consistently produces better quality of
recommendations for diferent number of latent factors.</p>
        <p>Fig. 4f shows the performance when averaged over all genres.
From the plot, we observed that using 80 latent factors showed best
performance for all genres except for comedy genre. The
deviation in the case of “comedy” genre is expected as the number of
items in the source domains are relatively less. Table 3 shows the
performance of the diferent recommendations algorithms when
80 latent topics are used. Clearly, the proposed CTRswb model
significantly improves over CTRlda and other methods in all the
cold-start scenarios.</p>
      </sec>
      <sec id="sec-12-2">
        <title>Genre</title>
      </sec>
      <sec id="sec-12-3">
        <title>Action</title>
      </sec>
      <sec id="sec-12-4">
        <title>Comedy</title>
      </sec>
      <sec id="sec-12-5">
        <title>Drama</title>
      </sec>
      <sec id="sec-12-6">
        <title>Romance</title>
      </sec>
      <sec id="sec-12-7">
        <title>Thriller</title>
      </sec>
      <sec id="sec-12-8">
        <title>Method PMF LDA SWB</title>
        <p>CTRlda
CTRswb</p>
        <p>PMF
LDA
SWB
CTRlda
CTRswb</p>
        <p>PMF
LDA
SWB
CTRlda
CTRswb</p>
        <p>PMF
LDA
SWB
CTRlda
CTRswb</p>
        <p>PMF
LDA
SWB
CTRlda
CTRswb
(d) Romance
(e) Thriller
(f) Mean of all Genres
6.2.2 Data sparsity scenario and the impact of number of ratings:
In this study, to explore the behavior of cross-domain
recommendation, we examined the latent topic space under data sparsity
scenario. We use the same movielens data as in Table 2 and create
10 data sparsity situations by incrementally removing (random)
10% of the ratings from the source genres. Throughout, similar to
study in cold-start context, we do not use ratings from the target
genre. To validate our findings, we have shown the evaluations
only for the topic space of 80 latent factors. Figs. 5a–5e shows mean
average precision of top20 recommendations for diferent degrees
of sparsity (rating ratio) in the source domain.</p>
        <p>The efect of number of ratings is much clear and
straightforward compared to the efects of number of latent factors. The results
reveal that the number of ratings in source genres have a significant
impact on the accuracy. However, the scale of the impact is very
diferent on each target domain as number of ratings in some genres
are less. From the plots, it shows that the more user preferences are
available in auxiliary domains, the better the accuracy of
recommendations on target domain. When the number of ratings have
increased, the PMF, LDA, SWB and CTRlda have shown moderate
improvements in terms of MAP. Our approach consistently shows
better performance, by large margin, than these methods. Over all,
the results show that the latent factors of CTRswb are very
reliable and could improve the recommendations even under extreme
sparse data scenarios.
7</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSIONS</title>
      <p>We have proposed an approach to validate our hypothesis that the
quality of recommendations can be improved by explicitly
utilizing the general topic word distributions while learning the latent
features. Our approach recommends items to users based on both
content and user preferences, and could at best exploit the content
information in both single and cross-domain scenarios. Our results
on single-domain show the superiority over pure latent factor and
CTRlda models, and results on the cross-domain demonstrate its
robustness under cold-start and data sparsity situations.</p>
      <p>In the future, we plan to explore cross-domain recommendation
scenarios in heterogeneous settings (e.g movies to books). In
addition to this, we have used a simple collaborative filtering approach
with zero-rating information from target domain, we believe
utilizing the target domain ratings could result in better cross-domain
recommendations.
(d) Romance
(e) Thriller
(f) Mean of all Genres</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>David</surname>
            <given-names>M Blei</given-names>
          </string-name>
          , Andrew Y Ng, and
          <string-name>
            <given-names>Michael I</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of machine Learning research</source>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Iván</given-names>
            <surname>Cantador</surname>
          </string-name>
          , Ignacio Fernández-Tobías,
          <string-name>
            <given-names>Shlomo</given-names>
            <surname>Berkovsky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Cross-domain recommender systems</article-title>
          .
          <source>In Recommender Systems Handbook</source>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Chaitanya</given-names>
            <surname>Chemudugunta</surname>
          </string-name>
          , Padhraic Smyth, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Steyvers</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Modeling general and specific aspects of documents with a probabilistic topic model</article-title>
          .
          <source>In NIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Wei</given-names>
            <surname>Chen</surname>
          </string-name>
          , Wynne Hsu, and
          <string-name>
            <given-names>Mong Li</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Making recommendations from multiple domains</article-title>
          .
          <source>In ACM SIGKDD.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F Maxwell</given-names>
            <surname>Harper and Joseph A Konstan</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The movielens datasets: History and context</article-title>
          .
          <source>ACM Transactions on Interactive Intelligent Systems (TiiS)</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Yifan</given-names>
            <surname>Hu</surname>
          </string-name>
          , Yehuda Koren, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Collaborative filtering for implicit feedback datasets</article-title>
          .
          <source>In ICDM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Alexandros</given-names>
            <surname>Karatzoglou</surname>
          </string-name>
          , Xavier Amatriain, Linas Baltrunas, and
          <string-name>
            <given-names>Nuria</given-names>
            <surname>Oliver</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Multiverse recommendation: n-dimensional tensor factorization for contextaware collaborative filtering</article-title>
          .
          <source>In ACM REcSys.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Yehuda</given-names>
            <surname>Koren</surname>
          </string-name>
          , Robert Bell, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Matrix factorization techniques for recommender systems</article-title>
          .
          <source>Computer</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Bin</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiangyang</given-names>
            <surname>Xue</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Transfer learning for collaborative ifltering via a rating-matrix generative model</article-title>
          .
          <source>In ICML. ACM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Sujian</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jiwei</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Tao</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Wenjie</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Baobao</given-names>
            <surname>Chang</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A novel topic model for automatic term extraction</article-title>
          .
          <source>In ACM SIGIR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Julian</surname>
            <given-names>McAuley</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Hidden factors and hidden topics: understanding rating dimensions with review text</article-title>
          .
          <source>In ACM RecSys.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Andriy</given-names>
            <surname>Mnih and Ruslan R Salakhutdinov</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Probabilistic matrix factorization</article-title>
          .
          <source>In NIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Orly</surname>
            <given-names>Moreno</given-names>
          </string-name>
          , Bracha Shapira, Lior Rokach, and
          <string-name>
            <given-names>Guy</given-names>
            <surname>Shani</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Talmud: transfer learning for multiple domains</article-title>
          .
          <source>In Proceedings of the 21st ACM international conference on Information and knowledge management.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Rong</surname>
            <given-names>Pan</given-names>
          </string-name>
          , Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Scholz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>One-class collaborative filtering</article-title>
          .
          <source>In ICDM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Weike</surname>
            <given-names>Pan</given-names>
          </string-name>
          , Nathan N Liu, Evan W Xiang, and
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Transfer learning to predict missing ratings via heterogeneous user feedbacks</article-title>
          .
          <source>In IJCAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Weike</surname>
            <given-names>Pan</given-names>
          </string-name>
          , Evan Wei Xiang, Nathan Nan Liu, and
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Transfer Learning in Collaborative Filtering for Sparsity Reduction</article-title>
          .
          <source>In AAAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Stefen</given-names>
            <surname>Rendle</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Factorization machines</article-title>
          .
          <source>In ICDM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Shaghayegh</given-names>
            <surname>Sahebi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Trevor</given-names>
            <surname>Walker</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Content-Based Cross-Domain Recommendations Using Segmented Models</article-title>
          .
          <source>CBRecSys</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Yue</surname>
            <given-names>Shi</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Martha</given-names>
            <surname>Larson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Alan</given-names>
            <surname>Hanjalic</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Tags as bridges between domains: Improving recommendation with tag-induced cross-domain collaborative filtering</article-title>
          .
          <source>In International Conference on User Modeling, Adaptation, and Personalization</source>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Steyvers</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tom</given-names>
            <surname>Grifiths</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Probabilistic topic models. Handbook of latent semantic analysis (</article-title>
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Fatemeh</given-names>
            <surname>Vahedian</surname>
          </string-name>
          and
          <string-name>
            <given-names>Robin D</given-names>
            <surname>Burke</surname>
          </string-name>
          .
          <article-title>Predicting Component Utilities for LinearWeighted Hybrid Recommendation</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Chong</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <surname>David M Blei</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Collaborative topic modeling for recommending scientific articles</article-title>
          .
          <source>In ACM SIGKDD.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Tong</surname>
            <given-names>Zhao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Julian McAuley</surname>
            ,
            <given-names>and Irwin</given-names>
          </string-name>
          <string-name>
            <surname>King</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Improving latent factor models via personalized feature projection for one class recommendation</article-title>
          .
          <source>In ACM CIKM.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>