=Paper=
{{Paper
|id=Vol-1887/paper6
|storemode=property
|title=Transfer Learning from APP Domain to News Domain for Dual Cold-Start Recommendation
|pdfUrl=https://ceur-ws.org/Vol-1887/paper6.pdf
|volume=Vol-1887
|authors=Jixiong Liu,Jiakun Shi,Wanling Cai,Bo Liu,Weike Pan,Qiang Yang,Zhong Ming
|dblpUrl=https://dblp.org/rec/conf/recsys/LiuSCLP0M17
}}
==Transfer Learning from APP Domain to News Domain for Dual Cold-Start Recommendation==
Transfer Learning from APP Domain to News Domain for Dual Cold-Start Recommendation Jixiong Liu† , Jiakun Shi† , Wanling Cai† , Bo Liu‡ , Weike Pan† Qiang Yang∗‡ and Zhong Ming∗† † College of Computer Science and Software Engineering, Shenzhen University ‡ Department of Computer Science and Engineering, Hong Kong University of Science and Technology {1455606137,1033150729,382970614}@qq.com, {bliuab,qyang}@cse.ust.hk, {panweike,mingz}@szu.edu.cn *: corresponding author. ABSTRACT time in finding proper information such as music, goods and News recommendation has been a must-have service for most news articles. For instance, personalized news recommenda- mobile device users to know what has happened in the world. tion [1, 2] has been one of the must-have services for most In this paper, we focus on recommending latest news ar- mobile device users, which plays an important role in help- ticles to new users, which consists of the new user cold- ing users keep up with the current affairs in the world. In start challenge and the new item (i.e., news article) cold- this paper, we focus on recommending latest news articles start challenge, and is thus termed as dual cold-start recom- to new users, i.e., the users are newly registered in a certain mendation (DCSR). As a response, we propose a solution news recommendation service and have not read any news called neighborhood-based transfer learning (NTL) for this articles before, and the news articles have not been read by new problem. Specifically, in order to address the new user any users before. We term it as dual cold-start recommenda- cold-start challenge, we propose a cross-domain preference tion (DCSR), denoting both cold-start users and cold-start assumption, i.e., users with similar app-installation behav- items. iors are likely to have similar tastes in news articles, and For the dual cold-start problem, previous news recommen- then transfer the knowledge of neighborhood of the cold- dation methods [1, 2] are not applicable, because they rely start users from an APP domain to a news domain. For on users’ historical reading behaviors and news articles’ con- the new item cold-start challenge, we design a category-level tent information that are not available in our case. preference to replace the traditional item-level preference be- We turn to address the cold-start recommendation prob- cause the latter is not applicable for the new items in our lem from a transfer learning [3, 4] view. Although there are problem. We then conduct empirical studies on a real in- no users’ behaviors about the cold-start users and cold-start dustry data with both users’ app-installation behaviors and items in the news domain, there may be some other relat- news-reading behaviors, and find that our NTL is able to de- ed domains with users’ behaviors. Specifically, we leverage liver the news articles more accurately than other methods some knowledge from a related domain, i.e., APP domain, on different ranking-oriented evaluation metrics. where the users’ app-installation behaviors are available. We find that most cold-start users in the news domain have al- ready installed some apps, and that may be helpful in de- CCS Concepts termining his/her preferences in news articles. In particular, •Information systems → Personalization; •Human- we assume that users with similar app-installation behaviors centered computing → Collaborative filtering; are likely to have similar interests in some news topics. In other words, close neighbors in the APP domain are likely Keywords to be close neighbors in the news domain. With the above cross-domain preference assumption, we Transfer Learning; News Recommendation; Cold-Start Rec- propose to take the neighborhood in the APP domain as ommendation the knowledge and try to transfer it to the target domain of news articles. Specifically, we design a neighborhood-based 1. INTRODUCTION transfer learning (NTL) solution that transfers knowledge Intelligent recommendation systems [5] have been a ubiq- of neighborhood from the APP domain to the news domain, uitous service in our daily life, which has saved us a lot of which addresses the new user cold-start challenge. With the neighborhood, some well-studied neighborhood-based rec- RecSysKTL Workshop @ ACM RecSys ’17, August 27, 2017, Como, Italy ommendation methods are applicable for news recommen- © 2017 Copyright is held by the author(s) dation. We conduct empirical studies on a real industry data in order to verify our cross-domain preference assumption and the effectiveness of our transfer learning solution. Experi- mental results show that the two domains of apps and news articles are indeed related and can share some knowledge for preference learning. 38 2. OUR SOLUTION tion. Mathematically, the preference prediction rule for user u to item i is as follows, 1 X r̂u,i = r̂u′ ,i , (1) |Nu | ′ u ∈Nu where Nu is a set of nearest neighbors of user u in terms of a certain similarity measurement such as cosine similarity, and r̂u′ ,i is the estimated preference of user u′ (a close neighbor of user u) to item i. The aggregated and normalized score r̂u,i is taken as the preference of user u to item i, which is further used for item ranking and top-K recommendation. For our studied dual cold-start recommendation problem, we can not build correlations between a cold-start user in the test data and a warm-start user in the training data using the data from the news domain only. The main idea of our Figure 1: An illustration of neighborhood-based transfer learning [3] solution is to leverage the correlations transfer learning (NTL) for dual cold-start recom- among the users in the APP domain with the assumption mendation (DCSR). that users with similar app-installation behaviors are likely to be similar in news taste. For instance, two users with the installed apps of the same genre business may both prefer 2.1 Problem Definition news articles on topics like finance. With the cross-domain preference assumption, we first cal- In our studied news recommendation problem, we have culate the cosine similarity between a cold-start user u and two domains, including an APP domain and a news domain. a warm-start user u′ in the APP domain as follows, Firstly, in the APP domain, we have a set of triples, i.e., (u, g, Gug ), denoting that user u has installed Gug times of Gu· GTu′ · mobile apps belonging to the genre g. The data of the APP su,u′ = p q , (2) domain can then be represented as a user-genre matrix G Gu· GTu· Gu′ · GTu′ · as shown in Figure 1. where Gu· is a row vector w.r.t. user u from the user-genre Secondly, in the news domain, we have a user-item ma- trix R denoting whether a user has read an item. Each matrix G. Once we have calculated the cosine similarity, item i is associated with a level-1 category c1 (i) and a level- for each cold-start user u, we first remove users with a small 2 category c2 (i). We thus have a set of quadruples, i.e., similarity value (e.g., su,u′ < 0.1), and then take some (e.g., 100) most similar users to construct a neighborhood Nu . (u, i, c1 (i), c2 (i)), denoting that user u has read an item i be- For the item-level preference r̂u′ ,i in Eq.(1), we are not longing to c1 (i) and c2 (i). Finally, we have a user-category able to have such a score directly because the item i is new matrix C after pre-processing, where each entry denotes the number of items belonging to a certain category that a user for all users, including the warm-start users and the target has read. cold-start user u′ . We thus propose to approximate the item- Our goal is to recommend a ranked list of new items (i.e., level preference using a category-level preference, latest news articles) to each new user, who has not read any r̂u′ ,i ≈ r̂u′ ,c(i) , (3) items before. We can see that it is a new user cold-start and new item cold-start problem, which is thus termed as where c(i) can be the level-1 category or level-2 category. dual cold-start recommendation (DCSR). Note that we only We then have two types of category-level preferences, make use of items’ category information, but not content information. r̂u′ ,c(i) = r̂u′ ,c1 (i) = Nu′ ,c1 (i) , (4) We put some notations in Table 1. r̂u′ ,c(i) = r̂u′ ,c2 (i) = Nu′ ,c2 (i) , (5) 2.2 Challenges where Nu′ ,c1 (i) and Nu′ ,c2 (i) denote the number of read items The main difficulty of the DCSR problem is the lack of (by user u′ ) belonging to the level-1 category c1 (i) and the preference data for new users and new items. Specifically, level-2 category c2 (i), respectively. there are two challenges, including (i) the new user cold-start Finally, with the Eqs.(3-5), we can rewrite Eq.(1) as fol- challenge, i.e., the target users (to whom we will provide rec- lows, ommendations) have not read any items before; and (ii) the 1 X new item cold-start challenge, i.e., the target items (that we r̂u,i ≈ Nu′ ,c1 (i) , (6) |Nu | ′ will recommend to the target users) are totally new for all u ∈Nu users. Under such a situation, most existing recommenda- 1 X r̂u,i ≈ Nu′ ,c2 (i) , (7) tion algorithms are not applicable. |Nu | ′ u ∈Nu 2.3 Neighborhood-based Transfer Learning which will be used for preference prediction in our empiri- In most recommendation methods [5], the user-user (or cal studies. Specifically, the neighborhood Nu addresses the item-item) similarity is a central concept, because the neigh- new user cold-start challenge, and the category-level prefer- borhood can be constructed for like-minded users’ preference ence Nu′ ,c1 (i) or Nu′ ,c2 (i) addresses the new item cold-start aggregation and then for the target user’s preference predic- challenge. 39 Table 1: Some notations and explanations. Note that when a user u has read the same article more than one time, we only count it one in calculating Nu,c1 and Nu,c2 . u user id i item (i.e., news article) id g genre id of the apps C1 a set of level-1 categories, c1 ∈ C1 C2 a set of level-2 categories, c2 ∈ C2 Nu,c1 the number of read items (by user u) belonging to a level-1 category c1 Nu,c2 P the number of read items (by user u) belonging to a level-2 category c2 Nc1 = Pu Nu,c1 the number of read items (by all users) belonging to a level-1 category c1 Nc2 = u Nu,c2 the number of read items (by all users) belonging to a level-2 category c2 N p c 1 = P ′ c1 N ′ the popularity of the level-1 category c1 among the users c ∈C1 c 1 1 Nc2 p c2 = P ′ the popularity of the level-2 category c2 among the users c ∈C2 Nc′ 2 2 Nu a set of neighbors of user u 3. EXPERIMENTAL RESULTS • Popularity-based ranking via level-1 category (PopRank- C1). In PopRank-C1, we first calculate the popular- 3.1 Dataset and Evaluation Metrics ity pc1 of each level-1 category c1 ∈ C1 in the training In our empirical studies, we use a real industry data, which data, and then use r̂i = pc1 (i) in Table 1 as the score consists of an APP domain and a news domain. to rank each item (i.e., article) i in the test data. For APP Domain In the auxiliary domain, i.e., APP domain, the most popular level-1 category, there may be more we have 827,949 users and 53 description terms (i.e., gen- than K = 15 items (i.e., articles) in the test data, we res) of the users’ installed mobile apps, where the genres are then randomly take K items (i.e., articles) from that from Google Play. Considering our target task of news rec- level-1 category for recommendation. ommendation, we removed 14 undiscriminating or irrelevant • Popularity-based ranking via level-2 category (PopRank- genres such as tools, communication, social, entertainment, C2). In PopRank-C2, we use r̂i = pc2 (i) in Table 1 productivity, weather, dating, etc. Finally, we have a matrix as the prediction rule similar to that of PopRank-C1. G with 827,949 users (or rows) and 39 genres (or columns), where each entry represents the number of times that a user For the number of neighbors in our neighborhood-based has installed apps belonging to a genre. transfer learning method, we first fix it as 100, and then News Domain In the target domain, i.e., news domain, change it to 50 and 150 in order to study its impact. We de- we have two sets of data, including a training data and a note our transfer learning solution with level-1 category as test data. The training data spans from 10 January 2017 to NTL-C1 and that with level-2 category as NTL-C2, where 30 January 2017, and contains 806,167 users, 747,643 items their prediction rules are shown in Eq.(6) and Eq.(7), respec- (i.e., news articles), and 16,199,385 unique (user, item) pairs. tively. Note that for Random, PopRank-C1, PopRank-C2, We can see that a user has read about 16199385/806167 = and NTL with randomly selected neighbors, we repeat the 20.09 articles on average from 10 January 2017 to 30 January experiments for 10 times, and report the average results. 2017. The test data are from the data on 31 January 2017, 3.3 Results which contains 3,597 new users, 28,504 new items (i.e., news articles), and 4,813 unique (user, item) pairs. We can see We report the main results in Table 2. From Table 2, we that a cold-start user read about 4813/3597 = 1.34 articles can have the following observations: on 31 January 2017. Note that we have |C1 | = 26 level-1 • The overall performance ordering is PopRank-C1, Ran- categories and |C2 | = 222 level-2 categories about the items dom, PopRank-C2 ≪ NTL-C2 < NTL-C1, which clear- in the news domain. ly shows the effectiveness of our proposed transfer learn- For performance evaluation, we adopt some commonly ing solution to the challenging dual cold-start recom- used evaluation metrics in ranking-oriented recommendation mendation problem. such as precision, recall, F1, NDCG and 1-call. Specifically, • The performance of PopRank-C2 and PopRank-C1 are we study the average performance of the top-15 recommend- rather poor in comparison with our proposed solution. ed list generated for each cold-start user in the test data. The reason is that popularity-based methods are non- 3.2 Baselines and Parameter Settings personalized methods and will simply select one most popular level-1 category or level-2 category for all user- We compare our proposed transfer learning solution with s, which ignores the difference in users’ news reading a random method and two popularity-based methods using preferences. category information. • For the relative performance of NTL-C1 and NTL-C2, • Random recommendation (Random). In Random, we can see that NTL-C1 performs better as expect- we randomly select K = 15 items in the test data for ed because the level-1 category may introduce more each cold-start user. smoothing effect for the cold-start problem. 40 Table 2: Recommendation performance of random recommendation, popularity-based ranking, and our NTL on cold-start users. Method Prec@15 Rec@15 F1@15 NDCG@15 1-call@15 Random 5.56E-05 5.84E-04 9.78E-05 2.27E-04 8.34E-04 PopRank-C1 5.00E-05 5.59E-04 9.02E-05 2.38E-04 7.51E-04 PopRank-C2 1.46E-04 1.74E-03 2.65E-04 6.48E-04 2.20E-03 NTL-C1 0.0053 0.0645 0.0095 0.0255 0.0734 NTL-C2 0.0040 0.0501 0.0073 0.0206 0.0567 0.08 We further study the impact of the neighborhood size. Random neighborhood The results of our NTL-C1 using 50, 100 and 150 neighbors Transferred neighborhood Performance 0.06 are shown in Figure 2. We can see that the results are relatively stable with different numbers of neighbors, and 0.04 configuring it as 100 usually produces the best performance. 0.08 0.02 Prec@15 Rec@15 0 0.06 Performance F1@15 Prec@15 Rec@15 F1@15 NDCG@15 1−call@15 Metrics NDCG@15 0.04 1−call@15 0.02 Figure 3: Recommendation performance of our NTL with level-1 category (NTL-C1) using random neigh- 0 borhood and transferred neighborhood. 50 100 150 Neighborhood size that our NTL performs significantly more accurate than the few applicable methods, i.e., popularity-based ranking using Figure 2: Recommendation performance of our category information. NTL with level-1 category (NTL-C1) using differ- For future works, we are interested in selecting some rep- ent neighborhood sizes. resentative genres and categories in two domains and build- In order to gain some deep understanding about the trans- ing a mapping between them, which will be further used to ferred neighborhood, we study the performance of random- study the neighborhood of the items. ly choosing the same number (i.e., 100) of neighbors in our NTL-C1. We report the results in Figure 3, from which we 5. ACKNOWLEDGMENT can have the observations: We thank the support of Natural Science Foundation of • The neighborhood constructed using the app-installation China (NSFC) Nos. 61502307 and 61672358, China Nation- behaviors is better than that of the random counter- al 973 project No. 2014CB340304, and Hong Kong CERG part, which shows that the two domains are related projects Nos. 16211214, 16209715 and 16244616. and can indeed transfer knowledge from one domain to the other. 6. REFERENCES • The difference between the two types of neighborhood [1] A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google is not as large as that between popularity-based meth- news personalization: Scalable online collaborative ods and our NTL in Table 2, which can be explained filtering. In Proceedings of the 16th International by the fact that a large portion of users’ preferences or Conference on World Wide Web, WWW ’07, pages tastes in news articles are similar. 271–280, 2007. [2] J. Liu, P. Dolan, and E. R. Pedersen. Personalized news recommendation based on click behavior. In 4. CONCLUSIONS AND FUTURE WORK Proceedings of the 15th International Conference on In this paper, we study an important and challenging news Intelligent User Interfaces, IUE ’10, pages 31–40, 2010. recommendation problem called dual cold-start recommen- [3] S. J. Pan and Q. Yang. A survey on transfer learning. dation (DCSR), which aims to recommend latest news arti- IEEE Transactions on Knowledge and Data cles (cold-start items) to newly registered users (cold-start Engineering, 22(10):1345–1359, 2010. users). Specifically, we propose a neighborhood-based trans- [4] W. Pan. A survey of transfer learning for collaborative fer learning (NTL) solution, which is able to address the new recommendation with auxiliary data. Neurocomputing, user cold-start challenge and the new item cold-start chal- 177:447–453, 2016. lenge by the transferred neighborhood from the APP domain [5] F. Ricci, L. Rokach, and B. Shapira. Recommender and the category-level preferences in the news domain, re- Systems Handbook (Second Edition). Springer, 2015. spectively. Empirical results on a real industry data show 41