=Paper= {{Paper |id=Vol-1887/paper6 |storemode=property |title=Transfer Learning from APP Domain to News Domain for Dual Cold-Start Recommendation |pdfUrl=https://ceur-ws.org/Vol-1887/paper6.pdf |volume=Vol-1887 |authors=Jixiong Liu,Jiakun Shi,Wanling Cai,Bo Liu,Weike Pan,Qiang Yang,Zhong Ming |dblpUrl=https://dblp.org/rec/conf/recsys/LiuSCLP0M17 }} ==Transfer Learning from APP Domain to News Domain for Dual Cold-Start Recommendation== https://ceur-ws.org/Vol-1887/paper6.pdf
     Transfer Learning from APP Domain to News Domain for
                Dual Cold-Start Recommendation

                             Jixiong Liu† , Jiakun Shi† , Wanling Cai† , Bo Liu‡ , Weike Pan†
                                             Qiang Yang∗‡ and Zhong Ming∗†
                       †
                     College of Computer Science and Software Engineering, Shenzhen University
       ‡
           Department of Computer Science and Engineering, Hong Kong University of Science and Technology
              {1455606137,1033150729,382970614}@qq.com, {bliuab,qyang}@cse.ust.hk,
                                  {panweike,mingz}@szu.edu.cn
                                                            *: corresponding author.

ABSTRACT                                                                    time in finding proper information such as music, goods and
News recommendation has been a must-have service for most                   news articles. For instance, personalized news recommenda-
mobile device users to know what has happened in the world.                 tion [1, 2] has been one of the must-have services for most
In this paper, we focus on recommending latest news ar-                     mobile device users, which plays an important role in help-
ticles to new users, which consists of the new user cold-                   ing users keep up with the current affairs in the world. In
start challenge and the new item (i.e., news article) cold-                 this paper, we focus on recommending latest news articles
start challenge, and is thus termed as dual cold-start recom-               to new users, i.e., the users are newly registered in a certain
mendation (DCSR). As a response, we propose a solution                      news recommendation service and have not read any news
called neighborhood-based transfer learning (NTL) for this                  articles before, and the news articles have not been read by
new problem. Specifically, in order to address the new user                 any users before. We term it as dual cold-start recommenda-
cold-start challenge, we propose a cross-domain preference                  tion (DCSR), denoting both cold-start users and cold-start
assumption, i.e., users with similar app-installation behav-                items.
iors are likely to have similar tastes in news articles, and                   For the dual cold-start problem, previous news recommen-
then transfer the knowledge of neighborhood of the cold-                    dation methods [1, 2] are not applicable, because they rely
start users from an APP domain to a news domain. For                        on users’ historical reading behaviors and news articles’ con-
the new item cold-start challenge, we design a category-level               tent information that are not available in our case.
preference to replace the traditional item-level preference be-                We turn to address the cold-start recommendation prob-
cause the latter is not applicable for the new items in our                 lem from a transfer learning [3, 4] view. Although there are
problem. We then conduct empirical studies on a real in-                    no users’ behaviors about the cold-start users and cold-start
dustry data with both users’ app-installation behaviors and                 items in the news domain, there may be some other relat-
news-reading behaviors, and find that our NTL is able to de-                ed domains with users’ behaviors. Specifically, we leverage
liver the news articles more accurately than other methods                  some knowledge from a related domain, i.e., APP domain,
on different ranking-oriented evaluation metrics.                           where the users’ app-installation behaviors are available. We
                                                                            find that most cold-start users in the news domain have al-
                                                                            ready installed some apps, and that may be helpful in de-
CCS Concepts                                                                termining his/her preferences in news articles. In particular,
•Information systems → Personalization; •Human-                             we assume that users with similar app-installation behaviors
centered computing → Collaborative filtering;                               are likely to have similar interests in some news topics. In
                                                                            other words, close neighbors in the APP domain are likely
Keywords                                                                    to be close neighbors in the news domain.
                                                                               With the above cross-domain preference assumption, we
Transfer Learning; News Recommendation; Cold-Start Rec-
                                                                            propose to take the neighborhood in the APP domain as
ommendation
                                                                            the knowledge and try to transfer it to the target domain of
                                                                            news articles. Specifically, we design a neighborhood-based
1.    INTRODUCTION                                                          transfer learning (NTL) solution that transfers knowledge
  Intelligent recommendation systems [5] have been a ubiq-                  of neighborhood from the APP domain to the news domain,
uitous service in our daily life, which has saved us a lot of               which addresses the new user cold-start challenge. With the
                                                                            neighborhood, some well-studied neighborhood-based rec-
RecSysKTL Workshop @ ACM RecSys ’17, August 27, 2017, Como, Italy           ommendation methods are applicable for news recommen-
© 2017 Copyright is held by the author(s)                                   dation.
                                                                               We conduct empirical studies on a real industry data in
                                                                            order to verify our cross-domain preference assumption and
                                                                            the effectiveness of our transfer learning solution. Experi-
                                                                            mental results show that the two domains of apps and news
                                                                            articles are indeed related and can share some knowledge for
                                                                            preference learning.




                                                                       38
2.   OUR SOLUTION                                                          tion. Mathematically, the preference prediction rule for user
                                                                           u to item i is as follows,
                                                                                                      1   X
                                                                                            r̂u,i =          r̂u′ ,i ,               (1)
                                                                                                    |Nu | ′
                                                                                                                  u ∈Nu

                                                                           where Nu is a set of nearest neighbors of user u in terms of a
                                                                           certain similarity measurement such as cosine similarity, and
                                                                           r̂u′ ,i is the estimated preference of user u′ (a close neighbor
                                                                           of user u) to item i. The aggregated and normalized score
                                                                           r̂u,i is taken as the preference of user u to item i, which is
                                                                           further used for item ranking and top-K recommendation.
                                                                              For our studied dual cold-start recommendation problem,
                                                                           we can not build correlations between a cold-start user in the
                                                                           test data and a warm-start user in the training data using
                                                                           the data from the news domain only. The main idea of our
Figure 1: An illustration of neighborhood-based                            transfer learning [3] solution is to leverage the correlations
transfer learning (NTL) for dual cold-start recom-                         among the users in the APP domain with the assumption
mendation (DCSR).                                                          that users with similar app-installation behaviors are likely
                                                                           to be similar in news taste. For instance, two users with the
                                                                           installed apps of the same genre business may both prefer
2.1 Problem Definition                                                     news articles on topics like finance.
                                                                              With the cross-domain preference assumption, we first cal-
   In our studied news recommendation problem, we have                     culate the cosine similarity between a cold-start user u and
two domains, including an APP domain and a news domain.                    a warm-start user u′ in the APP domain as follows,
   Firstly, in the APP domain, we have a set of triples, i.e.,
(u, g, Gug ), denoting that user u has installed Gug times of                                                   Gu· GTu′ ·
mobile apps belonging to the genre g. The data of the APP                                su,u′ = p                  q             ,         (2)
domain can then be represented as a user-genre matrix G                                                     Gu· GTu· Gu′ · GTu′ ·
as shown in Figure 1.
                                                                           where Gu· is a row vector w.r.t. user u from the user-genre
   Secondly, in the news domain, we have a user-item ma-
trix R denoting whether a user has read an item. Each                      matrix G. Once we have calculated the cosine similarity,
item i is associated with a level-1 category c1 (i) and a level-           for each cold-start user u, we first remove users with a small
2 category c2 (i). We thus have a set of quadruples, i.e.,                 similarity value (e.g., su,u′ < 0.1), and then take some (e.g.,
                                                                           100) most similar users to construct a neighborhood Nu .
(u, i, c1 (i), c2 (i)), denoting that user u has read an item i be-
                                                                              For the item-level preference r̂u′ ,i in Eq.(1), we are not
longing to c1 (i) and c2 (i). Finally, we have a user-category
                                                                           able to have such a score directly because the item i is new
matrix C after pre-processing, where each entry denotes the
number of items belonging to a certain category that a user                for all users, including the warm-start users and the target
has read.                                                                  cold-start user u′ . We thus propose to approximate the item-
   Our goal is to recommend a ranked list of new items (i.e.,              level preference using a category-level preference,
latest news articles) to each new user, who has not read any                                           r̂u′ ,i ≈ r̂u′ ,c(i) ,               (3)
items before. We can see that it is a new user cold-start
and new item cold-start problem, which is thus termed as                   where c(i) can be the level-1 category or level-2 category.
dual cold-start recommendation (DCSR). Note that we only                   We then have two types of category-level preferences,
make use of items’ category information, but not content
information.                                                                              r̂u′ ,c(i)       = r̂u′ ,c1 (i) = Nu′ ,c1 (i) ,   (4)
   We put some notations in Table 1.                                                      r̂u′ ,c(i)       = r̂u′ ,c2 (i) = Nu′ ,c2 (i) ,   (5)

2.2 Challenges                                                             where Nu′ ,c1 (i) and Nu′ ,c2 (i) denote the number of read items
   The main difficulty of the DCSR problem is the lack of                  (by user u′ ) belonging to the level-1 category c1 (i) and the
preference data for new users and new items. Specifically,                 level-2 category c2 (i), respectively.
there are two challenges, including (i) the new user cold-start               Finally, with the Eqs.(3-5), we can rewrite Eq.(1) as fol-
challenge, i.e., the target users (to whom we will provide rec-            lows,
ommendations) have not read any items before; and (ii) the                                                1    X
new item cold-start challenge, i.e., the target items (that we                              r̂u,i ≈                  Nu′ ,c1 (i) ,        (6)
                                                                                                       |Nu | ′
will recommend to the target users) are totally new for all                                                          u ∈Nu

users. Under such a situation, most existing recommenda-                                                        1   X
                                                                                           r̂u,i       ≈              Nu′ ,c2 (i) ,         (7)
tion algorithms are not applicable.                                                                           |Nu | ′
                                                                                                                     u ∈Nu

2.3 Neighborhood-based Transfer Learning                                   which will be used for preference prediction in our empiri-
   In most recommendation methods [5], the user-user (or                   cal studies. Specifically, the neighborhood Nu addresses the
item-item) similarity is a central concept, because the neigh-             new user cold-start challenge, and the category-level prefer-
borhood can be constructed for like-minded users’ preference               ence Nu′ ,c1 (i) or Nu′ ,c2 (i) addresses the new item cold-start
aggregation and then for the target user’s preference predic-              challenge.




                                                                      39
Table 1: Some notations and explanations. Note that when a user u has read the same article more than one
time, we only count it one in calculating Nu,c1 and Nu,c2 .

                 u                         user id
                 i                         item (i.e., news article) id
                 g                         genre id of the apps
                 C1                        a set of level-1 categories, c1 ∈ C1
                 C2                        a set of level-2 categories, c2 ∈ C2
                 Nu,c1                     the number of read items (by user u) belonging to a level-1 category c1
                 Nu,c2 P                   the number of read items (by user u) belonging to a level-2 category c2
                 Nc1 = Pu Nu,c1            the number of read items (by all users) belonging to a level-1 category c1
                 Nc2 = u Nu,c2             the number of read items (by all users) belonging to a level-2 category c2
                            N
                 p c 1 = P ′ c1 N ′        the popularity of the level-1 category c1 among the users
                          c ∈C1    c
                           1           1
                             Nc2
                 p c2 = P ′                the popularity of the level-2 category c2 among the users
                          c ∈C2 Nc′
                           2      2
                 Nu                        a set of neighbors of user u



3.    EXPERIMENTAL RESULTS                                                   • Popularity-based ranking via level-1 category (PopRank-
                                                                               C1). In PopRank-C1, we first calculate the popular-
3.1 Dataset and Evaluation Metrics                                             ity pc1 of each level-1 category c1 ∈ C1 in the training
   In our empirical studies, we use a real industry data, which                data, and then use r̂i = pc1 (i) in Table 1 as the score
consists of an APP domain and a news domain.                                   to rank each item (i.e., article) i in the test data. For
APP Domain In the auxiliary domain, i.e., APP domain,                          the most popular level-1 category, there may be more
we have 827,949 users and 53 description terms (i.e., gen-                     than K = 15 items (i.e., articles) in the test data, we
res) of the users’ installed mobile apps, where the genres are                 then randomly take K items (i.e., articles) from that
from Google Play. Considering our target task of news rec-                     level-1 category for recommendation.
ommendation, we removed 14 undiscriminating or irrelevant                    • Popularity-based ranking via level-2 category (PopRank-
genres such as tools, communication, social, entertainment,                    C2). In PopRank-C2, we use r̂i = pc2 (i) in Table 1
productivity, weather, dating, etc. Finally, we have a matrix                  as the prediction rule similar to that of PopRank-C1.
G with 827,949 users (or rows) and 39 genres (or columns),
where each entry represents the number of times that a user                  For the number of neighbors in our neighborhood-based
has installed apps belonging to a genre.                                  transfer learning method, we first fix it as 100, and then
News Domain In the target domain, i.e., news domain,                      change it to 50 and 150 in order to study its impact. We de-
we have two sets of data, including a training data and a                 note our transfer learning solution with level-1 category as
test data. The training data spans from 10 January 2017 to                NTL-C1 and that with level-2 category as NTL-C2, where
30 January 2017, and contains 806,167 users, 747,643 items                their prediction rules are shown in Eq.(6) and Eq.(7), respec-
(i.e., news articles), and 16,199,385 unique (user, item) pairs.          tively. Note that for Random, PopRank-C1, PopRank-C2,
We can see that a user has read about 16199385/806167 =                   and NTL with randomly selected neighbors, we repeat the
20.09 articles on average from 10 January 2017 to 30 January              experiments for 10 times, and report the average results.
2017. The test data are from the data on 31 January 2017,                 3.3 Results
which contains 3,597 new users, 28,504 new items (i.e., news
articles), and 4,813 unique (user, item) pairs. We can see                  We report the main results in Table 2. From Table 2, we
that a cold-start user read about 4813/3597 = 1.34 articles               can have the following observations:
on 31 January 2017. Note that we have |C1 | = 26 level-1                     • The overall performance ordering is PopRank-C1, Ran-
categories and |C2 | = 222 level-2 categories about the items                  dom, PopRank-C2 ≪ NTL-C2 < NTL-C1, which clear-
in the news domain.                                                            ly shows the effectiveness of our proposed transfer learn-
   For performance evaluation, we adopt some commonly                          ing solution to the challenging dual cold-start recom-
used evaluation metrics in ranking-oriented recommendation                     mendation problem.
such as precision, recall, F1, NDCG and 1-call. Specifically,
                                                                             • The performance of PopRank-C2 and PopRank-C1 are
we study the average performance of the top-15 recommend-
                                                                               rather poor in comparison with our proposed solution.
ed list generated for each cold-start user in the test data.
                                                                               The reason is that popularity-based methods are non-
3.2 Baselines and Parameter Settings                                           personalized methods and will simply select one most
                                                                               popular level-1 category or level-2 category for all user-
  We compare our proposed transfer learning solution with
                                                                               s, which ignores the difference in users’ news reading
a random method and two popularity-based methods using
                                                                               preferences.
category information.
                                                                             • For the relative performance of NTL-C1 and NTL-C2,
     • Random recommendation (Random).        In Random,                       we can see that NTL-C1 performs better as expect-
       we randomly select K = 15 items in the test data for                    ed because the level-1 category may introduce more
       each cold-start user.                                                   smoothing effect for the cold-start problem.




                                                                     40
Table 2: Recommendation performance of random recommendation, popularity-based ranking, and our NTL
on cold-start users.

                                       Method             Prec@15      Rec@15           F1@15                 NDCG@15            1-call@15
                                       Random             5.56E-05     5.84E-04        9.78E-05                2.27E-04          8.34E-04
                                       PopRank-C1         5.00E-05     5.59E-04        9.02E-05                2.38E-04          7.51E-04
                                       PopRank-C2         1.46E-04     1.74E-03        2.65E-04                6.48E-04          2.20E-03
                                       NTL-C1              0.0053       0.0645          0.0095                  0.0255            0.0734
                                       NTL-C2              0.0040       0.0501          0.0073                  0.0206             0.0567



                                                                                                           0.08
   We further study the impact of the neighborhood size.                                                          Random neighborhood
The results of our NTL-C1 using 50, 100 and 150 neighbors                                                         Transferred neighborhood




                                                                                             Performance
                                                                                                           0.06
are shown in Figure 2. We can see that the results are
relatively stable with different numbers of neighbors, and
                                                                                                           0.04
configuring it as 100 usually produces the best performance.

                    0.08                                                                                   0.02
                                Prec@15
                                Rec@15                                                                       0
                    0.06
      Performance




                                F1@15                                                                             Prec@15    Rec@15      F1@15    NDCG@15 1−call@15
                                                                                                                                        Metrics
                                NDCG@15
                    0.04        1−call@15

                    0.02                                                                Figure 3: Recommendation performance of our NTL
                                                                                        with level-1 category (NTL-C1) using random neigh-
                      0                                                                 borhood and transferred neighborhood.
                           50               100                  150
                                      Neighborhood size

                                                                                        that our NTL performs significantly more accurate than the
                                                                                        few applicable methods, i.e., popularity-based ranking using
Figure 2: Recommendation performance of our
                                                                                        category information.
NTL with level-1 category (NTL-C1) using differ-
                                                                                          For future works, we are interested in selecting some rep-
ent neighborhood sizes.
                                                                                        resentative genres and categories in two domains and build-
   In order to gain some deep understanding about the trans-                            ing a mapping between them, which will be further used to
ferred neighborhood, we study the performance of random-                                study the neighborhood of the items.
ly choosing the same number (i.e., 100) of neighbors in our
NTL-C1. We report the results in Figure 3, from which we                                5. ACKNOWLEDGMENT
can have the observations:                                                                 We thank the support of Natural Science Foundation of
     • The neighborhood constructed using the app-installation                          China (NSFC) Nos. 61502307 and 61672358, China Nation-
       behaviors is better than that of the random counter-                             al 973 project No. 2014CB340304, and Hong Kong CERG
       part, which shows that the two domains are related                               projects Nos. 16211214, 16209715 and 16244616.
       and can indeed transfer knowledge from one domain
       to the other.                                                                    6. REFERENCES
     • The difference between the two types of neighborhood                             [1] A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google
       is not as large as that between popularity-based meth-                               news personalization: Scalable online collaborative
       ods and our NTL in Table 2, which can be explained                                   filtering. In Proceedings of the 16th International
       by the fact that a large portion of users’ preferences or                            Conference on World Wide Web, WWW ’07, pages
       tastes in news articles are similar.                                                 271–280, 2007.
                                                                                        [2] J. Liu, P. Dolan, and E. R. Pedersen. Personalized news
                                                                                            recommendation based on click behavior. In
4.     CONCLUSIONS AND FUTURE WORK                                                          Proceedings of the 15th International Conference on
   In this paper, we study an important and challenging news                                Intelligent User Interfaces, IUE ’10, pages 31–40, 2010.
recommendation problem called dual cold-start recommen-                                 [3] S. J. Pan and Q. Yang. A survey on transfer learning.
dation (DCSR), which aims to recommend latest news arti-                                    IEEE Transactions on Knowledge and Data
cles (cold-start items) to newly registered users (cold-start                               Engineering, 22(10):1345–1359, 2010.
users). Specifically, we propose a neighborhood-based trans-                            [4] W. Pan. A survey of transfer learning for collaborative
fer learning (NTL) solution, which is able to address the new                               recommendation with auxiliary data. Neurocomputing,
user cold-start challenge and the new item cold-start chal-                                 177:447–453, 2016.
lenge by the transferred neighborhood from the APP domain                               [5] F. Ricci, L. Rokach, and B. Shapira. Recommender
and the category-level preferences in the news domain, re-                                  Systems Handbook (Second Edition). Springer, 2015.
spectively. Empirical results on a real industry data show




                                                                                  41