<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cross Domain Recommendation Using Vector Space Transfer Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Masahiro Kazama Recruit Technologies Co.</string-name>
          <email>vistvan@r.recruit.co.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ltd. Tokyo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Japan masahiro_kazama@r.recruit.co.jp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>István Varga Recruit Technologies Co.,Ltd.</institution>
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>The cold start problem, frequent with recommender systems, addresses the issue in cases where we don't know enough about our users (e.g., the user hasn't rated anything yet, or there are no user activities) in that specific domain. In our paper we present a simple and robust transfer learning approach where we model users' behavior in a source domain, transferring that knowledge to a new, target domain. First, we vectorize the items by using word2vec for each dataset independently. Second, we calculate the transformation matrix that connects the source dataset to the target dataset by using their common users.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>The economic advantages of cross selling products
between different domains (e.g., movies or books) are
appealing and cross-domain recommendation has become a hot
topic in recent years. However, while it is not always trivial
how to identify patterns that successfully model user
behavior, doing so across domains is even more difficult. If one
tends to like David Lynch movies, we may be able to
recommend “The Elephant Man”, but what does this knowledge
translates to in another similar yet different domain, such as
the books domain? What about in an apparently unrelated
domain, such as the restaurants domain?</p>
      <p>In this paper we propose a very simple and robust
transferlearning based approach that can successfully model the
behavior patterns of users across different domains, showing
promising results even with only a small number of common
users.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORKS</title>
      <p>
        In recent years, numerous transfer learning approaches
have been proposed. Sahebi et.al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] use canonical
correlation analysis in a cross-domain recommendation setting,
but their method can only recommend items that are rated
by the common users. Our work is partly inspired and
extends the work presented in Zhang et.al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. They attempt
to identify correspondences between words across different
timeframes by first building two independent vector spaces
for each timeframe using word2vec [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Next, using
anchor words whose meaning does not change significantly over
time, they learn a transformation matrix between the two
vector spaces, which enables the correspondence between the
two timeframes, identifying word pairs that are semantically
similar in their respective timeframes such as Walkman
iPod. We can apply the same methodology to recommender
systems to address the cold start problem or to augment for
missing information, where ratings from a source domain
can be used to leverage for ratings in a new, target domain
with the common users acting as anchors.
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>PROPOSED METHOD</title>
      <p>Consider the case when we have a source domain and a
new, target domain with no common items, but common
users. Such cases are very frequent in real-life situations.
We propose a cross-domain recommender methodology that
can accurately recommend items from the entire inventory
of the target domain, leveraging on the common users. We
illustrate the scope of our paper in figure 1. Our method
captures analogies in user behavior across domains by
modeling user preference in the source model, transferring this
knowledge to a new, target domain. The prerequisite for our
proposal is only the existence of a relatively small group of
common users.</p>
      <p>The proposed method consists of two steps. First, we
generate the vector representation for each item, separately for
both the source and target domains. We use word2vec for
this generation process. Word2vec is usually employed for
text analysis, but in our process we use it for user rate logs,
treating items as words, with each users’ positive items
organized into paragraphs. Secondly, using the common users,
we learn a transformation matrix M between the source and
target domains.</p>
      <p>N
M = arg min XkM uis − uik</p>
      <p>t
M i=1
(1)
where uis(∈ RK ) is the vector of user i in the source vector
t
space, ui in the target vector space. User vectors are
calculated as the mean of the item vectors rated by the user as
positive. K is the dimension of each vector. N is the
number of common users. M (∈ RK×K ) is the transformation
matrix between source and target spaces. Source dataset
leveraged recommendation is performed by calculating the
user vector and converting it to the target vector space by
using the transformation matrix M . Once this conversion is
achieved, the converted user vector can be compared to the
item vectors in the new domain and items with most similar
vectors can be recommended to this user. In this paper, the
similarity measure used is the cosine similarity.</p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTS</title>
      <p>In our experiments we used the ratings from the yelp
dataset1. User ratings range from 1 to 5, but we only
consider positive ratings (4 and 5), ignoring the rest. Items are
tagged by various labels (e.g. Restaurants, Shopping).</p>
      <p>We validated our proposal on the 3 most frequent
categories in terms of item count: Restaurants, Shopping and
Food. For each source-target domain, we removed the items
which were labeled in both categories. For example, in the
Restaurants-Shopping pairing, after removing the common
items, we retained 131,825 and 10,899 users respectively,
among them, 8,642 were common users. Table 1
summarizes the statistics of the above categories.
As test data, we used 20% of the common users in all
source-target pairing configurations.(e.g., 1728 users in the
Restaurant-Shopping settings). The rest of data is used for
training. In order to observe the effect of the number of
common users, we incrementally increased the number of
common users in our training data, starting from 10% of the
training data and increasing by 10% at each step. For the
evaluation metric we used recall@100, as we are interested in
correctly identifying the items the users actually rate as
positive in target domain, using the data only from the source
domain. Besides recall@100, we experimented with other
1https://www.yelp.com/dataset challenge
metrics such as recall@10 and @50, precision@10, @50 and
@100. However, the same tendencies were observed in all
experiments. As baseline, we treat the source and target
domains as a single dataset, generating one common vector
space model using word2vec in a similar way with our
proposal, with the recommended items being the ones whose
vector is closest to the user’s vector.
5.</p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>We proposed a simple, but robust transfer learning
approach that can bridge the gap across differing domains.
We found that our method is most efficient (1) when the
number of users in the source domain is smaller than the
number of the users in the target domain and (2) when the
number of common users is small across the two domains.</p>
      <p>We are investigating the efficiency of other item vector
representation methods besides word2vec, such as matrix
factorization within our transfer learning framework.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Proceedings of NIPS 2013</source>
          , pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahebi</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Brusilovsky</surname>
          </string-name>
          .
          <article-title>It takes two to tango: An exploration of domain pairs for cross-domain collaborative filtering</article-title>
          .
          <source>In Proceedings of RecSys 2015</source>
          , pages
          <fpage>131</fpage>
          -
          <lpage>138</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          .
          <article-title>Omnia mutantur, nihil interit: Connecting past with present by finding corresponding terms across time</article-title>
          .
          <source>In Proceedings of ACL 2015</source>
          , pages
          <fpage>645</fpage>
          -
          <lpage>655</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>