<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ComplexRec</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Review-Based Cross-Domain Collaborative Filtering: A Neural Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thanh-Nam Doan</string-name>
          <email>tdoan@albany.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shaghayegh Sahebi</string-name>
          <email>ssahebi@albany.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University at Albany - SUNY</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>20</volume>
      <abstract>
        <p>Cross-domain collaborative filtering recommenders exploit data from other domains (e.g., movie ratings) to predict users' interests in a diferent target domain (e.g., suggest music). Most current crossdomain recommenders focus on modeling user ratings but pay limited attention to user reviews. Additionally, due to the complexity of these recommender systems, they cannot provide any information to users to support user decisions. To address these challenges, we propose Deep Hybrid Cross Domain (DHCD) model, a cross-domain neural framework, that can simultaneously predict user ratings, and provide useful information to strengthen the suggestions and support user decision across multiple domains. Specifically, DHCD enhances the predicted ratings by jointly modeling two crucial facets of users' product assessment: ratings and reviews. To support decisions, it models and provides natural review-like sentences across domains according to user interests and item features. This model is robust in integrating user rating and review information from more than two domains. Our extensive experiments show that DHCD can significantly outperform advanced baselines in rating predictions and review generation tasks. For rating prediction tasks, it outperforms cross-domain and single-domain collaborative ifltering as well as hybrid recommender systems. Furthermore, our review generation experiments suggest an improved perplexity score and transfer of review information in DHCD.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Recommender systems; •
Computing methodologies → Neural networks.
Cross-domain Collaborative filtering, neural network, hybrid
collaborative filtering</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Nowadays, users are overwhelmed by the number of choices online.
Recommender systems are increasingly used as an essential tool,
to alleviate this problem. Despite improvements in recommender
systems, many of them still sufer from problems, including
coldstart [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and dificulty in explaining their suggestions [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
Moreover, collaborative filtering recommenders [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] cannot use obvious
feature-based relations between users and items. Content-based
approaches cannot capture deeper social or semantic similarities
between users and items, nor they can suggest novel items (outside
the scope of user profile features) to users [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        Two major approaches to address some of these problems are
hybrid [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and cross-domain [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] recommender systems. Hybrid
recommender systems merge content-based and collaborative
filtering approaches to provide higher-quality recommendations. Some
hybrid recommender systems jointly model user ratings and
reviews to introduce a more sophisticated view to user interests and
item features, that leads to improved recommendation results [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        The idea behind cross-domain recommendation systems is to
share useful information across two or more domains to improve
recommendation results [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. They work by transferring
information from one or more source or auxiliary domains to suggest useful
items in a target domain. Especially, when user history in the target
domain (e.g., books) does not provide enough information about
user interests, user preferences in another source domain (e.g.,
movies) can provide useful insights that can lead to more accurate
or novel recommendations1. In addition to improving
recommendation results, cross-domain recommender algorithms provide a
solution to problems, such as cold-start or user profiling, in
singledomain recommenders.
      </p>
      <p>
        Both hybrid and cross-domain recommender systems have shown
to be successful in the current literature. However, a combination
of two has been rarely studied. Additionally, the problem of
providing more information to users to support their decisions in
crossdomain recommender systems, has not been studied. Most of the
current research in cross-domain recommenders focus on
collaborative filtering cross-domain approaches [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. These approaches
incorporate users’ explicit (e.g., rating) or implicit (e.g., purchase)
feedback in the auxiliary domain to recommend items in the target
domain. Many of these algorithms jointly model multiple domains
by sharing common user’s latent representations across them.
Collaborative filtering cross-domain recommenders, similar to their
single-domain counterparts, sufer from ignoring content
information. Having advanced models, which are built on users’ rating
or binary feedback, complicates the reasoning of why a specific
user may be interested in an item. Moreover, these recommender
algorithms lose the explicit user-item similarities by ignoring an
important source of information: user reviews.
      </p>
      <p>To further enhance the performance and transparency of
crossdomain recommendation systems, we propose to combine hybrid
and cross-domain approaches together. With this fusion, we can
benefit from the strength of both hybrid and cross-domain
recommender systems: cross-domain modeling will enhance user latent
features by providing extra information from other domains
(especially in sparser ones), reviews will bring another dimension
for enriching user and item latent features and ofer insights to
increase the recommendation transparency. Therefore, merging
1While other definitions of domain exist in the literature, e.g., time-based domains, in
this paper, we focus on item domains (e.g., item type or category).
the two will enrich content features by using review information
across domains as well as enhance prediction performance.</p>
      <p>
        Accordingly, we propose Deep Hybrid Cross Domain (DHCD)
recommender, which models various types of user feedback (both
ratings and reviews) across multiple domains under neural
network framework. We use neural network as a natural choice to
model reviews due to its success in natural language processing
and generating natural language sentences [
        <xref ref-type="bibr" rid="ref26 ref5">5, 26</xref>
        ]. In addition to
using reviews for producing better-quality suggestions, DHCD can
generate natural and useful reviews to support user decisions for
suggested cross-domain items. By generating a review that is based
on the specific user’s interests across domains and other reviews,
we can help clarify why a specific item is recommended to user. Our
model shares information across domains in two levels by sharing
users’ latent representations, and cascading it into reviews’ latent
representations. It can capture non-linear user-item relationships
by having a neural network framework [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Our results and findings
of this research are summarized as follows:
• We propose a neural network framework named Deep
Hybrid Cross Domain (DHCD) model which unifies ratings and
reviews of users and items across multiple domains.
• To the best of our knowledge, DHCD is the first framework
which is able to automatically generate cross-domain reviews
that in turn can provide decision support for cross domain
recommendations.
• We design and implement multiple experiments to evaluate
DHCD’s performance in three real-world datasets. Our
evaluation is performed via two main tasks: rating prediction and
review generation tasks, to answer four research questions.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORKS</title>
      <p>
        Here, we briefly review the literature on cross-domain
recommendations and neural network-based collaborating filtering.
Cross-Domain Recommendation focuses on learning user
preferences from data across multiple domains [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. There are two
focuses on cross domain recommendation: collaborative filtering [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
and content-based methods [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. In this work, we focus on
collaborative filtering cross domain recommendations. Similar to
singledomain collaborative filtering, research work on cross domain
recommendation usually use matrix factorization. For example, Pan et
al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] propose a cross domain recommendation system based on
matrix factorization by using a coordinate system transfer method.
Elkahky et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] use deep learning framework to improve the
performance of cross domain recommendation and also provide a
scalable method to handle large datasets. However, not considering
the reviews of items is the main limitation of these methods.
      </p>
      <p>
        Xin et al. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] proposed the first review-based cross-domain
recommender model. They proposed a graphical model to capture the
user ratings and item reviews across domains but reviews are not
used to model user latent features. Later, Song et al. proposed a joint
tensor factorization model to capture both user reviews and implicit
feedback on items to provide cross-domain recommendations [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
However, it does not capture non-linearities across domains, nor it
models reviews as natural sequences of words. None of the above
works generate reviews.
      </p>
      <p>Domain 1</p>
      <p>Rating Prediction</p>
      <p>Good
comic
book
&lt;EOS&gt;
user
Domain 2</p>
      <p>Layer Q
Layer 2
Layer 1
Layer Q
Layer 2</p>
      <p>Layer 1
Rating Prediction
item
item
&lt;start&gt;</p>
      <p>Good
comic</p>
      <p>book
Nice
jazz
song</p>
      <p>&lt;EOS&gt;
&lt;start&gt;</p>
      <p>Nice
jazz
song</p>
      <p>Rating Regression Component Review Generation Component
Figure 1: An overview of Deep Hybrid Cross Domain
(DHCD) recommendation system.</p>
      <p>
        Neural Frameworks for Collaborative Filtering. Due to its
ability to approximate non-linear relation of users and items, neural
network is rapidly growing in recommendation systems [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <p>
        He et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] propose a fusion model that combines matrix
factorization and multi-layer perceptron. Despite the eficiency of their
proposed model, it does not consider reviews and is not extended to
cross-domain recommendation system. Collaborative Deep
Learning (CDL) [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] overcome the sparsity of ratings by using auxiliary
information such as reviews. Using reviews as a set of words, their
model outperforms baselines but not considering the sequential
nature of words in reviews is a limitation.
      </p>
      <p>
        Review Generation. Ni et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] presented one of the first works
that focuses on generating reviews along with preference prediction.
Ni and McAuley [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] propose a neural network based upon
attention model to assist users writing reviews of items. However, these
works and others [
        <xref ref-type="bibr" rid="ref1 ref8">1, 8</xref>
        ] do not model the preference between users
and items, nor they are extendable to cross-domain recommenders.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>PROPOSED FRAMEWORK</title>
      <p>In this section, we describe the architecture of Deep Hybrid Cross
Domain (DHCD) recommendation system in detail.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Architecture</title>
      <p>DHCD predicts user ratings on items and generates user reviews
on them using two main components: the rating regression
component and the review generation component. In the rating regression
component (RRC), user ratings on items of each domain are
modeled as a function of user and item latent representations. For each
user, this component learns a shared latent representation across
all domains. Moreover, the shared representations of users has a
role as a gate to transfer information across domains. The shared
user latent representations in combination with domain-specific
latent item representations predict user ratings on items. The
review generation component (RGC) generates user reviews on items
according to user, item, and word latent representations. In this
component, the user and item representations from rating
regression component work as a guide to learn review word embeddings
per user-item review. This guidance helps sharing word embedding
information across domains. Figure 1 illustrates the architecture of
our model. In the following, we present our model in more details.
Notations We model the system to include a set of users U, and a
set of item domains D. Each of these domains include a set of items
Id , d ∈ D. For a user u ∈ U and each item i ∈ Id , the training
data may include user’s rating on that item (rudi ) and user’s review
on that item (sudi ). Accordingly, we model training data in domain d
as a set of tuples T d = {(u, i, s, r )|u ∈ U, i ∈ Id , s ∈ Sd , r ∈ Rd }.
Given training data in all domains T , our goal is to simultaneously
estimate user u’s missing rating on item i in domain d (rˆud,i ) and
generate user u’s missing textual review on that item (sˆud,i ).
3.2</p>
      <p>Rating Regression Component (RRC)
The main purpose of this component is to form a structure to
infer user and item representations using observed user feedback
on items across all domains. To do this, we model each user u’s
interests as latent factors vu and item i’s representation (in domain
d) as latent factors vid . Then, user u’s predicted rating on item i, rˆudi ,
is calculated as a function дr (·) of vu and vid . Formally, we have:
rˆudi = дr (vu , vid )
(1)</p>
      <p>
        In many single-domain factorization-based recommender
systems, дr is modeled as the vector dot product of these latent factors
plus some bias b [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Namely, rˆui = vuT vid +b. This specification has
some limitations that makes it inappropriate for our cross-domain
problem. First, the simple factorization formulation is not fit for a
cross-domain problem, as it does not transfer information across
domains. Also, the predicted ratings in this model are assumed to
be a linear combination of user and item latent factors. However,
recent work suggests that using a non-linear model can enhance the
representation ability of user and items, and lead to more accurate
results [
        <xref ref-type="bibr" rid="ref12 ref5">5, 12</xref>
        ]. More specifically, in cross-domain recommenders,
Xin et al. have shown that user ratings across diferent domains
can have non-linear relationships with each other [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Finally, the
above formulation requires a shared latent space between users and
items. This assumption can restrict the expressiveness capacity of
the model since it (i) limits the user and item latent vectors to have
equal sizes, and (ii) assumes the kth element of user latent vector
must only interact with the corresponding element of item latent
factor. For further information, He et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] provide an example
which illustrates these above restrictions.
      </p>
      <p>
        To tackle the non-linearity problem, we model дr using deep
neural networks. Neural networks have been successfully used in
collaborative filtering problems [
        <xref ref-type="bibr" rid="ref2 ref25 ref7">2, 7, 25</xref>
        ] and can inherently model
non-linear relationships [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. To have a cross-domain solution, we
extend the collaborative neural network to include multiple item
domains. As shown in left side of Figure 1, for each domain d, we
construct a multi-layer perceptron network (H d ) with Q layers. To
share and transfer information across diferent domains, we model
the user latent factors vu to be shared across all domains.
Additionally, to avoid having a shared latent space between users and items,
we use concatenation instead of dot product in дr . Consequently,
the input xudi to our multi-layer perceptron’s first layer is a
concatenation of embedding latent vector vu of user u and embedding
latent vector vid of item i ∈ Id . Formally, xudi = [vu ; vid ].
      </p>
      <p>Layer H d maps this input xudi to rating rˆudi . We denote the q-th
hidden layer of H d as hdq , which includes a non-linear function
projecting from the output of hdq−1; i.e. hdq = ReLU(Wqd hq−1 + bqd )
d
where Wqd and bqd are parameters of H d ’s qth layer for domain d
and ReLU(x ) = max (0, x ). For the first layer, hd0 = xudi is the input.
We ensure the full connectivity between each two adjacent hidden
layers hdq and hdq−1. We use regression to map the output vector yˆd
Q
of final layer to the prediction value rˆudi i.e. rˆudi = wydyˆQd + byd where
rˆudi is the predicted rating value of user u and item i in domain d.
wyd ∈ Rr and byd ∈ R are regression parameters.</p>
      <p>To learn the parameters of RRC, we optimize the following
regression loss function:</p>
      <p>Lr = Õ
Õ</p>
      <p>d d 2
(rui − rˆui )
d ∈D u ∈U,i ∈Id
(2)
where rudi is the observed rating of user u and i in domain d.
3.3</p>
      <p>Review Generation Component (RGC)
This component is to model and generate reviews for user-item
pairs in cross-domain setting. Here, we model user, item, and review
word latent factors to generate natural language sentences.</p>
      <p>
        Recently, recurrent neural networks with components such as
long-short term memory (LSTM) and gated recurrent units (GRU)
have showed high performance in natural language
processingrelated tasks such as image captioning, Q&amp;A system [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Inspired
by their success, we adapt LSTM as a component for our review
generation process.
      </p>
      <p>As shown in Figure 1, for each domain d, we construct a
separated LSTM model H¯ d , that can connect to the rating regression
component. Assume sudi , user u’s review on item i in domain d, as a
sequence of words tj where j ≤ Jui (Jui is the number of words in
this review). Given a text sequence t1, t2, ..., t Jui , the LSTM network
will update its hidden state parameters (h¯d ), in step j, according to tj
j
and previous step’s hidden state (h¯d ). Subsequently, the network
j−1
will predict tj+1, step (j + 1)’s word, using all of its previous words
(t&lt;j+1). The output layer is connected to a softmax layer. The
general idea of review modeling is expressed by p(tj |t&lt;j , Φd ) = δ (h¯dj )
where Φd represents neural network’s hyperparameters for domain
d, and δ (·) is the softmax function. Each hidden state h¯dj is modeled
as a function of word tj and previous state h¯d
j−1</p>
      <p>
        The above “vanilla” LSTM can only model sentences from a
corpus and is unable to embed user and item latent features. For
reviews to represent user tastes, we have to make sure to include
user and item features. To enhance modeling power, we first apply
word2vec [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to the corpus of review texts to learn the embedding
vector of each word. Then, for each word in the review, we
concatenate this embedding vector with user and item latent vectors to
create a latent vector for word tj i.e. [word2vec(tj ); vu ; vid ]. There
are three main advantages for this representation mechanism: (i)
concatenation with user and item latent vectors ensures that the
user and item information will not vanish over steps. Consequently,
it can enhance the sequence generation; (ii) since user latent vectors
are shared across domains concatenation with user latent vectors
work as a mean to transfer information across domains for reviews;
and (iii) word2vec is able to learn some hidden word characteristics,
within corpus, which cannot be inferred from one-hot encoding
vector or tf-idf [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        To learn the parameters of LSTM network, we optimize the
Negative Log-likelihood of review data:
(3)
(4)
Ls = −
where hyperparameters λr and λs control the trade of between
rating regression and review generation tasks. λ is the regularization
term for avoiding overfitting. Vu and Vi are matrices that stack all
latent factors of users/items in all domains. Φ represents all
parameters of DHCD. The above loss function is eficient to be optimized
in end-to-end manner using back-propagation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS</title>
      <p>In this section, we evaluate our proposed model against several
baselines to demonstrate the robustness of DHCD.
4.1</p>
    </sec>
    <sec id="sec-7">
      <title>Datasets</title>
      <p>
        We consider three category combinations of Amazon datasets [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]:
Book and Digital Music; Book and Ofice Products; Digital Music
and Ofice Products. For each cross domain dataset, we select users
who make purchase and write reviews on both categories. In each
review, we filter out words whose frequency is less than 50. Table 1
describes some statistic of the datasets.
      </p>
      <p>
        Training/Test Data: For each dataset and each user, we
chronologically split their first 80% of ratings and reviews as training and
the remaining 20% as testing data.
• Matrix factorization (MF) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]: It uses user and item ratings
as an input. The predicted value is a linear combination of
interaction between user and item latent features as well as
the user/item/global bias.
• Neural Collaborative Filtering (NCF) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]: With ratings as its
input, this single-domain model combines neural network
and matrix factorization to capture the non-linear interaction
between users’ and items’ latent factors.
• Collaborative Deep Learning (CDL) [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]: Using ratings and
bag of words of reviews, this single-domain model fuses
neural network and topic modeling.
• Collaborative Filtering with Generative Concatenative
Networks (CF-GCN) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]: This single-domain hybrid model
uniifes both ratings and reviews under a neural framework.
• Cross-domain neural network (CDN) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: This model utilizes
neural networks for cross domain recommendation system.
However, it does not consider reviews along with ratings of
users and items.
      </p>
      <p>We design the experiments in two settings: regular single-domain,
and cross-domain. In the regular single-domain setting, we model
user feedback of one domain to recommend items in the same
domain. In the cross-domain setting, although the baseline may
have been designed for one domain, we use user feedback on both
domains to recommend useful items. For the single domain models,
we add a prefix “cd” to their name to indicate when they are
provided data from both domains. Specifically, both domain datasets
are unified and used for training cdMF and cdNCF.</p>
      <p>
        For rating prediction task, we compare DHCD against all the
baselines. For review generation task, we compare DHCD with CF-GCN
since it is the only baseline with review generation capacity.
Moreover, we also use word-based LSTM (W-LSTM) and character-based
LSTM (C-LSTM) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] as baselines for performance comparison.
      </p>
      <p>Significance Testing: Hypothesis testing is used to ensure if
prediction performance of our model is significantly diferent from
the baselines. In this test, for each metric, we select the method
whose performance is nearest to our DHCD for comparison.</p>
      <p>
        Default Parameter Setting: Number of latent factors for users
and items is set to 20 for all models. For the models using
multilayer perceptron (i.e. all models except MF), the number of layers
Q is equal in all domains. The capacity of layers of multi-layer
perceptron are set to 64, 32, 16, and 8. Embedding size of each word
is 50. For models using LSTM (i.e. DHCD, CF-GCN, C-LSTM,
WLSTM), numbers of layers in LSTM is set to 2 and its hidden size
is set to 128. We assume that rating and review contributions are
equal. So, we set λr = λд = 1 and the regularization term λ = 0.01
(see Equation 4). To learn model parameters, we use ADAM [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
with learning rate 10−4.
4.3
      </p>
    </sec>
    <sec id="sec-8">
      <title>Rating Prediction Performance</title>
      <p>
        Performance Measures: We use recall at K (r @K ), mean absolute
error (MAE), and root mean square error (RMSE) to measure the
performance of our proposed model and the baselines.
Cross-Domain Results: Table 2 shows our model’s and baselines’
performance in cross domain setting. For r @K , we assume ratings
greater than 3 as positive. We apply paired t-test [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for significance
testing. From the table, we observe that our model significantly
outperforms all baselines in all metrics. For example, the
performance of our model is 7% better than the one of CDL model in
term of MAE. Performance of CDL is generally better than cdNCF
which indirectly infers that using reviews can help us to enhance
the prediction performance. Similarly, CDN outperforms cdNCF
which emphasize the importance of modeling items into domains
in a cross-domain design, instead of simply merging diferent
domains’ data. Our model i.e. DHCD unifies review and rating in a
cross-domain design. Hence, it achieves higher performance than
other baselines.
      </p>
      <sec id="sec-8-1">
        <title>Book +</title>
        <p>Digital Music
3.10
3.12
3.02
2.93
Table 5: Perplexity comparison between our model and
baselines (Lower is better).</p>
        <p>Table 4 shows the result of our model in cold-start setting. Due
to the space limitation, we only provide the performance of CDN
for comparison on MAE and RMSE. As shown in the table, DHCD
significantly outperforms CDN in all three datasets. This shows
that using reviews in our model adds extra valuable information
for predicting user ratings, compared to the CDN model that only
uses rating information across the two domains.</p>
      </sec>
      <sec id="sec-8-2">
        <title>Book + Book + Digital Music + Digital Music Ofice Products Ofice Products MAE RMSE MAE RMSE MAE RMSE CDN 0.767 0.97 0.767 0.986 0.743 1.052</title>
        <p>DHCD 0.751* 0.94* 0.75* 0.922* 0.725* 1.031*
Table 4: Prediction Performance in Cold-start setting.
Notation * denotes p &lt; 0.05 in significance test.</p>
        <p>Single-Domain Results: Table 3 shows rating prediction
performance of DHCD and baselines in single domain set of experiments.
We only show the result for Book + Digital Music dataset to save
space since the experiments on the other two show the similar
result. For the baselines like MF, NCF, CDL, and CF-GCN, the
training data and test data are from the same domain. For DHCD and
CDN, we use the training data from both domains and report the
performance in each domain separately. The first observation that
we observe from the table is that for baseline models that are
singledomain by design, their performance in single domain is better
than in cross domain. For example, the r @10 of MF is 0.03 in Book
domain. However, this method only achieves 0.028 in domain Book
+ Digital Music domains (Table 2). It indirectly implies that
using heterogeneous data without proper integration can harm the
performance of models. Secondly, in general, our DHCD model
outperforms the baselines in each domain’s performance. It implies
that DHCD has the proper manner to fuse the ratings and reviews
of users and items from diferent domains under one framework.
Thirdly, almost in all models, but more specifically in DHCD,
performance in smaller domains is better than the one of larger domains.
For example, Book domain is larger than Digital Music domain and
the performance of CF-GCN in Book domain is worse than the one
in Digital Music. It suggests that larger domain contains more noise
than the smaller ones, and that the smaller datasets may benefit
more from information transfer.</p>
        <p>In general, the performance of DHCD is significantly better than
the best baseline in significance test for both cross-domain and
single domain settings.</p>
        <p>Rating Prediction in Cold-start:To conduct this experiment, we
keep the same test set and remove users with more than 5 ratings in
both domains from training set and use default parameter settings.
4.4</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Review Generation Analysis</title>
      <p>In this section, we use perplexity as a measurement for review
generation. The lower the perplexity, the better the model.
ppx = exp ©­− N1 Õ 1 ÕJui log p(tc |t&lt;c , Φ)ª® (5)
« (u,i) Jui c=1 ¬
where the pair (u, i) is a pair of user and item from test set and N
is the number of reviews in test set and Jui is the number of words
of each review between u and i. Φ denotes the parameters.</p>
      <p>Result: Table 5 shows the perplexity of DHCD and the baselines
C-LSTM, W-LSTM and CF-GCN. As shown in the table, perplexity
of our model is lower than the one of baselines in the three datasets.
For example, the perplexity of DHCD is 6% better than W-LSTM.
Therefore, it suggests the latent representations of users and items
learned from multi-layer perceptron are able to encode the review
generation process. Moreover, the performance of DHCD is better
than the one of CF-GCN which implies the domain consideration
is useful to generate reviews.
4.5</p>
    </sec>
    <sec id="sec-10">
      <title>The Efect of Reviews in Training</title>
      <p>In this section, we investigate the impact of reviews in training
our model. To do so we compare the rating regression training
loss of CDN and our model through epochs (Equation 2). CDN can
be considered as a simplified version of our model without using
reviews. The faster the convergence of the training loss, the better
the method. The parameters are kept as default values.</p>
      <p>Experimental Results: Figure 2 shows the training loss of
rating regression of DHCD and CDN through 50 epochs on the three
datasets. From the figure, our first observation is that the training
loss decreases when the number of epoch increases and it reaches</p>
      <p>B OP</p>
      <p>DM OP
experiments, DHCD outperforms the baselines on rating prediction
0.90
0.88
0.86
0.84
0.82
0.80
0.78
0.76
0.74
0.900
model (DHCD) in the three datasets: Book + Digital Music
(B_DM), Book + Ofice Products (B_OP) and Digital Music +
Ofice Products (DM_OP) through epochs.</p>
      <p>B DM</p>
      <p>B OP</p>
      <p>DM OP
25
Epoch
0.925 2.94
0.920
0.910
0.915 2.88
0.90
0.88 2.90
0.86 2.88
tio λr /λs in the three datasets: Book + Digital Music (B_DM),
Book + Ofice Products (B_OP) and Digital Music + Ofice
Products (DM_OP). For both metrics, the lower the better.
a stable value after some certain numbers of epoch. Secondly, the
two methods seem to converge to a fixed point in the three datasets.
However, DHCD converges faster than CDN method. For instance,
after 10 epochs, our method seems to be close to the convergence
but CDN needs 25 epochs to have the same behavior on the dataset
of Book + Digital Music. From the result, we can conclude that
reviews are actually helpful for the learning of DHCD.
4.6</p>
      <p>The Balance between Rating Prediction and
Review Generation
In DHCD, λr and λs are used to control the trade-of between rating
prediction and review generation tasks. To study their efects, we
keep λs = 1 and use various values of λr for training, then, we
measure the performance of DHCD on test set. The two metrics
perplexity and RMSE are selected for evaluation.</p>
      <p>Experimental Results: In this experiment, the values of ratio
λr /λs are {0.1, 0.5, 0.7, 1.0, 1.5, 2.0, 2.5}. The result is plotted on
Figure 3. From the figure, we observe that increasing the ratio
λr /λs leads to the better RMSE and worse perplexity since the
larger value of the ratio means that the more efort is used for
rating prediction. Moreover, the phenomenon is similar for the
three datasets.
5</p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION</title>
      <p>In this paper, we have proposed Deep Hybrid Cross Domain (DHCD)
recommendation system which captures the reviews and ratings of
users and items across diferent domains. Through our extensive
0.95
ss
oL0.90
n
sseo
i
rg0.85
e
R
g
n
i
ta0.80
R
n
i
a
rT0.75
3.00
2.98
2.96
ty
iex2.94
l
p
r
e2.92
P
2.90
2.88
2.86
and review generation tasks.</p>
      <p>There are several directions to extend our work further. DHCD
has not considered the sequence of users’ decision and the social
efect of users’ friends on their decision. Thus, these interesting
directions should be studied in the near future especially to address
the data sparsity issues.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Shuo</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Maxwell Harper</surname>
          </string-name>
          , and Loren Gilbert Terveen.
          <year>2016</year>
          .
          <article-title>Crowd-Based Personalized Natural Language Explanations for Recommendations</article-title>
          . In RecSys.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Thanh-Nam Doan</surname>
          </string-name>
          and
          <string-name>
            <surname>Ee-Peng Lim</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>PACELA: A Neural Framework for User Visitation in Location-based Social Networks</article-title>
          .
          <source>In UMAP.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Ali</given-names>
            <surname>Mamdouh</surname>
          </string-name>
          <string-name>
            <given-names>Elkahky</given-names>
            ,
            <surname>Yang Song</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiaodong</given-names>
            <surname>He</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A multi-view deep learning approach for cross domain user modeling in recommendation systems</article-title>
          .
          <source>In WWW.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Ignacio</given-names>
            <surname>Fernández-Tobías</surname>
          </string-name>
          , Iván Cantador, Marius Kaminskas, and
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Ricci</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Cross-domain recommender systems: A survey of the state of the art</article-title>
          .
          <source>In Spanish Conference on Information Retrieval.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , Yoshua Bengio, Aaron Courville, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep learning</article-title>
          .
          <source>Vol. 1</source>
          . MIT press Cambridge.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Ruining</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <surname>Julian McAuley</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering</article-title>
          .
          <source>In WWW.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Xiangnan</given-names>
            <surname>He</surname>
          </string-name>
          , Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and
          <string-name>
            <surname>Tat-Seng Chua</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Neural collaborative filtering</article-title>
          .
          <source>In WWW.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Reinhard</given-names>
            <surname>Heckel</surname>
          </string-name>
          , Michail Vlachos, Thomas Parnell, and
          <string-name>
            <given-names>Celestine</given-names>
            <surname>Dünner</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Scalable and interpretable product recommendations via overlapping co-clustering</article-title>
          .
          <source>In ICDE.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Kurt</given-names>
            <surname>Hornik</surname>
          </string-name>
          , Maxwell Stinchcombe,
          <string-name>
            <given-names>and Halbert</given-names>
            <surname>White</surname>
          </string-name>
          .
          <year>1989</year>
          .
          <article-title>Multilayer feedforward networks are universal approximators</article-title>
          .
          <source>Neural networks 2</source>
          ,
          <issue>5</issue>
          (
          <year>1989</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Henry</given-names>
            <surname>Hsu and Peter A Lachenbruch</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Paired t test</article-title>
          . Wiley StatsRef: Statistics Reference Online (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yehuda</surname>
            <given-names>Koren</given-names>
          </string-name>
          , Robert Bell, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Matrix Factorization Techniques for Recommender Systems</article-title>
          .
          <source>Computer</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Quoc</given-names>
            <surname>Le</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>In International Conference on Machine Learning.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Eficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Jianmo</surname>
            <given-names>Ni</given-names>
          </string-name>
          , Zachary C Lipton,
          <string-name>
            <surname>Sharad Vikram</surname>
          </string-name>
          , and
          <string-name>
            <surname>Julian McAuley</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Estimating reactions and recommending products with generative models of reviews</article-title>
          .
          <source>In Proceedings of the Eighth International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , Vol.
          <volume>1</volume>
          .
          <fpage>783</fpage>
          -
          <lpage>791</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Jianmo</given-names>
            <surname>Ni and Julian McAuley</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Personalized review generation by expanding phrases and attending on aspect-aware representations</article-title>
          .
          <source>In ACL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Weike</surname>
            <given-names>Pan</given-names>
          </string-name>
          , Evan Wei Xiang, Nathan Nan Liu, and
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Transfer Learning in Collaborative Filtering for Sparsity Reduction.</article-title>
          .
          <source>In AAAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Denis</given-names>
            <surname>Parra</surname>
          </string-name>
          and
          <string-name>
            <given-names>Shaghayegh</given-names>
            <surname>Sahebi</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Recommender systems: Sources of knowledge and evaluation metrics</article-title>
          .
          <source>In Advanced Techniques in Web Intelligence-2.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Francesco</surname>
            <given-names>Ricci</given-names>
          </string-name>
          , Lior Rokach, and
          <string-name>
            <given-names>Bracha</given-names>
            <surname>Shapira</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Recommender systems: introduction and challenges</article-title>
          .
          <source>In Recommender systems handbook. Springer</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Shaghayegh</given-names>
            <surname>Sahebi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Brusilovsky</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>It Takes Two to Tango: An Exploration of Domain Pairs for Cross-Domain Collaborative Filtering</article-title>
          . In RecSys.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahebi</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Walker</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Content-Based Cross-Domain Recommendations Using Segmented Models</article-title>
          .
          <source>In Workshop on New Trends in Content-based Recommender Systems (CBRecsys)</source>
          .
          <source>ACM</source>
          ,
          <volume>57</volume>
          -
          <fpage>63</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Andrew</surname>
            <given-names>I. Schein</given-names>
          </string-name>
          , Alexandrin Popescul,
          <string-name>
            <surname>Lyle H. Ungar</surname>
            , and
            <given-names>David M.</given-names>
          </string-name>
          <string-name>
            <surname>Pennock</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Methods and metrics for cold-start recommendations</article-title>
          .
          <source>In SIGIR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Tianhang</surname>
            <given-names>Song</given-names>
          </string-name>
          , Zhaohui Peng, Senzhang Wang, Wenjing Fu, Xiaoguang Hong, and
          <string-name>
            <given-names>S Yu</given-names>
            <surname>Philip</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Review-Based Cross-Domain Recommendation Through Joint Tensor Factorization</article-title>
          .
          <source>In DASFAA.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Hao</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Naiyan</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Dit-Yan Yeung</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Collaborative deep learning for recommender systems</article-title>
          .
          <source>In SIGKDD.</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Xin</surname>
            <given-names>Xin</given-names>
          </string-name>
          , Zhirun Liu,
          <string-name>
            <surname>Chin-Yew</surname>
            <given-names>Lin</given-names>
          </string-name>
          , Heyan Huang,
          <string-name>
            <given-names>Xiaochi</given-names>
            <surname>Wei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ping</given-names>
            <surname>Guo</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Cross-Domain Collaborative Filtering with Review Text.</article-title>
          .
          <source>In IJCAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Shuai</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Lina Yao, Aixin Sun, and
          <string-name>
            <given-names>Yi</given-names>
            <surname>Tay</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Deep Learning Based Recommender System: A Survey and New Perspectives</article-title>
          .
          <source>ACM Comput. Surv</source>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Yongfeng</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Xu</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Explainable Recommendation: A Survey and New Perspectives</article-title>
          . CoRR abs/
          <year>1804</year>
          .11192 (
          <year>2018</year>
          ). arXiv:
          <year>1804</year>
          .11192
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>