=Paper=
{{Paper
|id=Vol-1887/paper2
|storemode=property
|title=Cross-Domain Recommendation for Large-Scale Data
|pdfUrl=https://ceur-ws.org/Vol-1887/paper2.pdf
|volume=Vol-1887
|authors=Shaghayegh Sahebi,Peter Brusilovsky,Vladimir Bobrokov
|dblpUrl=https://dblp.org/rec/conf/recsys/SahebiBB17
}}
==Cross-Domain Recommendation for Large-Scale Data==
Cross-Domain Recommendation for Large-Scale Data Shaghayegh Sahebi Peter Brusilovsky Vladimir Bobrokov Department of Computer Science School of Information Sciences Rostelecom University at Albany – SUNY University of Pittsburgh 10 building 2, Bahrushina st Albany, NY 12222 Pittsburgh, PA 15260 Moscow, Russia 115184 ssahebi@albany.edu peterb@pitt.edu vcomzzz@gmail.com ABSTRACT systems. Full-scale cross-domain datasets are hard to find, so au- Cross-domain algorithms have been introduced to help improving thors frequently use simulated cross-domain datasets. For example, recommendations and to alleviate cold-start problem, especially Iwata and Takeuchi propose a matrix factorization based approach in small and sparse datasets. These algorithms work by transfer- in [8] where neither users nor items are shared between domains. ring information from source domain(s) to target domain. In this Although they used a large-scale dataset (using EachMovie, Netflix, paper, we study if such algorithms can be helpful for large-scale and MovieLens), their large-scale dataset is not from a cross-domain datasets. We introduce a large-scale cross-domain recommender system. Rather, this movie rating dataset is divided into random algorithm derived from canonical correlation analysis and analyze user and item splits. A similar splitting in large-scale movies domain its performance, in comparison with single and cross-domain base- can be seen in [15]. Moreover, the rare large-scale cross-domain line algorithms. Our experiments in both cold-start and hot-start experiment reports in literature focus mostly on content-based situations show the effectiveness of the proposed approach. cross-domain recommenders [4, 13, 18]. In [12], Loni et al. use fac- torization machines for domains in a large-scale Amazon dataset. KEYWORDS In their experiments, better use of within domain information gen- erated better results compared to using cross-domain information. Cross-Domain, Domain Selection While the current literature show the importance of cross-domain ACM Reference format: recommender systems, the limitations reviewed above do not allow Shaghayegh Sahebi, Peter Brusilovsky, and Vladimir Bobrokov. 2017. Cross- us to see how cross-domain recommender algorithms scale up. Domain Recommendation for Large-Scale Data. In Proceedings of RecSysKTL This paper attempts to fill in the gap of design and evalua- Workshop @ ACM RecSys ’17, August 27, 2017, Como, Italy, , 7 pages. tion of large-scale cross-domain recommenders by proposing a DOI: N/A cross-domain collaborative filtering algorithm and evaluating it using a dataset collected from a multi-domain recommender sys- tem, Imhonet. The proposed algorithm, CD-LCCA, is specifically 1 INTRODUCTION designed for scalability. Cross-domain recommendation systems are gradually becoming The proposed approach relies on canonical correlation analysis more attractive as a practical approach to improve quality of recom- (CCA) [7] for transferring information from source domain to target mendations. Number of social systems that collect user interaction domain. CCA has been used in context-aware single-domain recom- and preferences in different domains is constantly increasing. Ac- mendation [5], content-based cross-domain recommendation [4], cordingly, using information contributed by users in one system and medium-scale cross-domain collaborative filtering [17]. How- to help generate better recommendations in another system in a ever, it has not been scaled for large-scale cross-domain collabo- related domain has become more and more valuable. Especially rative filtering. In this paper, we use a computationally efficient important in this context is the ability of cross-domain collaborative implementation of CCA to model cross-domain recommendations filtering to soften the cold-start situation by offering meaningful in a large-scale dataset. We present our model in Section 2. We com- suggestions at the very start of user interaction with a new domain. pare the performance of our model with cross-domain and single Starting with a few proof-of-concept studies [1, 2, 6, 10, 19], cross- domain baselines in Section 3, and analyze its cold-start behavior domain recommenders emerged in a sizable stream of research in in Section 4. Finally, we present a time performance analysis of the the recommender systems field. algorithm in Section 5. Yet, in some sense, the work is still in early stages. While many different models have been proposed and explored, the dominating 2 LARGE-SCALE CCA-BASED approach to exploring new cross-domain recommendation ideas is to use public datasets that are relatively small in comparison with CROSS-DOMAIN ALGORITHM (CD-LCCA) the full scale of data (items and users) in real-life recommender 2.1 Background CCA is a multivariate statistical model that studies the interrela- RecSysKTL Workshop @ ACM RecSys ’17, August 27, 2017, Como, Italy tionships among sets of multiple dependent variables and multiple © 2017 Copyright is held by the author(s). independent variables[7]. Calculating CCA can be very resource- consuming especially in traditional approaches that should calcu- late QR-decomposition or singular value decomposition of large data matrices. To avoid this problem, Lu and Foster developed an iterative algorithm that can approximate CCA on very large 9 datasets[14]. This approach relies on LING, a gradient-based least Combining Equations 1, 2, and 3, we can now map between the squares algorithm that can work on large-scale matrices. To com- original source and target rating matrices as presented in Equation pute CCA in L-CCA in [14], first a projection of one of the data 4 and have an estimation of user ratings in the target domain (Ŷ ). matrices on a randomly-generated small matrix is produced. Then, a QR-decomposition of this smaller matrix is calculated. After that, Ŷ = X MWx c PWy−1 c N −1 (4) CCA is calculated iteratively, by applying LING on the reduced- When the rating matrix sizes are too large, calculating the multi- sized QR-decompositions of the original data matrices, in each plications in 4 can be resource-consuming. To resolve this, we take iteration. Every time after running LING, a QR-decomposition is advantage of the fact that [A|B]−1C = [A−1C |B −1C], and separate calculated for numerical stability. Here, we build our large-scale the source matrix into multiple smaller matrices, using column- cross-domain recommender algorithm based on L-CCA proposed wise partitioning. Then, we perform the multiplication on each of by Lu and Foster. these matrices and eventually join the results together. Equation 4 gives us the opportunity to relate the source and 2.2 Model target domain rating matrices. Based on that, we can estimate the Large scale CCA finds a lower-dimensional representation of each ratings in target domain Y based on ratings in source domain X . of the input matrices and then calculates the canonical correlation In other words, we can estimate user i’s rating on item j from analysis between these two smaller matrices. To base our cross- the target domain, given user i’s ratings in the sources domain domain recommender algorithm on LCCA, suppose that we have using Equation 5. Thus, our cross-domain recommender system a n × m source domain rating matrix X and a n × p target domain can suggest the most relevant items to users in the target domain, rating matrix Y . Here, n represents number of shared users between having user ratings in the source domain. In the following sections, source and target domains; m shows number of items in source we evaluate our proposed model both in the cold-start and hot-start domain; and p shows number of items in target domain. The goal setting, using a large-scale dataset of our model is to estimate user ratings in the target domain (Yi j s), given user ratings in the source domain (X i j s). We find the mapping cx kcc a c ŷi, j = Σm y −1 q=1 X i,q Σo=1 Mq,o Σl =1 Wx co,l Pl,l Σr =1Wycl, r Nr, j (5) that is between these two domains using LCCA as explained in the following. Note that the focus of our proposed model is on cross-domain Suppose that X c (n × xc ) is a lower dimensional matrix that recommenders with shared sets of users across domains. Although represents source domain rating matrix X , and Yc (n ×yc ) is a lower some of the research in the area of cross-domain recommender dimensional matrix that represents target rating matrix Y in the systems is focused on domains with non-overlapping data [8, 11, LCCA algorithm. Calculating the canonical correlations between 20, 21], the problem of lacking shared users have been a matter of X c and Yc leads us to two canonical variates (X c Wx c (n × kcca ) debate [3]. Some approaches have tried to approach this problem and Yc Wyc (n × kcca )) and a diagonal matrix P (kcca × kcca ) that by sharing a subset of users between domains [9, 22]. We will leave shows the canonical correlation between these variates. Using these this expansion of the proposed model for future work. canonical correlations and variates, we can map X c to Yc (and vice versa). For example, Yc can be achieved using Equation 1. 3 DO LARGE-SCALE CROSS-DOMAIN ALGORITHMS HELP? Yc = X c Wx c PWyTc (1) In our first set of experiments we study if the proposed cross- Although Equation 1 relates the lower dimensional representa- domain recommender system is useful in large-scale datasets. In tions of original source and target domains (X c and Yc ), we need other words, by comparing the cross-domain and single-domain to map the original source and target matrices (X and Y ) to esti- recommendation results, we explore if target domain user data mate user ratings in them. To build a relationship between original can be enough for achieving good recommendations in large-scale source and target domain matrices, we first look at the relationship datasets; or if auxiliary information can be helpful. between each domain matrix and its lower dimensional represen- tation. Without loss of generality, we consider the source domain 3.1 Dataset relationships. X c is built in the first step of LCCA by solving an We use the Imhonet dataset for carrying our experiments in this iterative least square problem, having a QR-decomposition in each paper. This is an anonymized dataset obtained from an online Rus- iteration. Although we loose the mapping information between X sian recommender service Imhonet.ru. It allows users to rate and and X c in this iterative process, having both X and final X c ma- review a range of items from various domains, from books and trices, we can restore their mapping. We can rewrite X and X c ’s movies to mobile phones and architectural monuments. Imhonet is relationship as in X c = X M. Here, M is a m × c x mapping that a true multi-domain system: while it supported different domains, projects X into X c ; and thus: each domain was treated almost as an independent sub-site with separate within-domain recommendations. This system also con- M = X −1X c (2) tains many aspects of a social network, including friendship links, The same can be applied to find the mapping of target rating ma- blogs and comments. Combination of explicit user feedback (rat- trices Y and its lower-dimensional representation Yc (Equation ings) and diverse domains makes Imhonet very unique and valuable 3). for cross-domain recommendation. We use a dataset that includes N = Y −1Yc (3) Imhonet’s four large domains - books, movies, games, and perfumes. 10 It contains a full set of user ratings (at the time of collection) across 0.5 RMSE of Approaches CD-CCA CD-SVD four domains.Each rating record in the dataset includes a user ID, 0.45 SD-SVD an item ID, and a rating value between zero (not rated) and ten. 0.4 The same user ID indicates the same user across the sets of ratings. 0.35 RMSE Some basic statistics about this dataset are shown in Table 1. To 0.3 pre-process this dataset we find shared users across category pairs. 0.25 0.2 Table 1: Basic Statistics for Imhonet Dataset. 0.15 eric eric eric eric eric eric eric eric eric eric eric eric Num meNum meNum vieNum meNum meNum okNum vieNum vieNum meNum okNum okNum ame rfu u o u a bo mo o a c_b o c_b o c_g _pe perf c_m perf c_g ric_ ric_ c_m c_g meri eric ric_ meri ric_ meri ume Nume meri meri meri meri kNu ieNum ume ookNu ume ovieNu v ieN e eNu eNu eNu umeNu boo N N m Book Game Movie Perfume mov bo o k b gam e m m o g a m perf u m perf u m Category Pairs Sorted by CD-CCA RMSE g a pe rf user size 362448 72307 426897 19717 item size 167384 12768 90793 3640 Figure 1: RMSE of algorithms on 12 Imhonet domain pairs density 0.00022 0.00140 0.00073 0.00350 # record 13438520 1324945 28281946 253948 MAE of Approaches average # rating per user 37.0771 18.2339 66.30 12.8796 0.4 CD-CCA CD-SVD average # rating per item 80.2856 103.7708 311.4992 69.7659 0.35 SD-SVD 0.3 3.2 Experiment Setup MAE 0.25 To run the experiments, we used a user-stratified 5-fold cross- 0.2 validation setting: 20% of users are selected as test users and the 0.15 rest of them (80%) are selected as training users. We recommend 0.1 items to test users given the training data and 20% of their ratings. ame eric eric eric eric eric eric Num meNum meNum meNum vieNum meNum vieNum meNum okNum vieNum okNum okNum eric eric eric eric o eric o eric c_g _pe rfu perf u perf u c_m o ric_ ga ric_ mo _ga _bo _mo c_b c_b ri ric_ ric_ meri eric eric eric meri meri ume eNume Some of the algorithms have parameters that should be selected b o o m mov e kNu ieNum eric boo k N ume eNume ookNu gam b m o v ieN ga m perf u m e Num m o v ie Num eNum perf u m g a m eNu umeNu perf Category Pairs Sorted by CD-CCA MAE by cross-validation. To find the best set of parameters for each algo- rithm, we select 15% of users as “validation" users and remove 80% Figure 2: MAE of algorithms on 12 Imhonet domain pairs of their ratings from the training set. We repeat the experiments 5 ordered by the MAE of the CD-LCCA times, and report the average performance of algorithms. To mea- sure the performance of algorithms, we use Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Although there see that “books” and “movies” domain pairs have the most num- are other measures, such as rank-based ones, to evaluate recom- ber of users and “games” and “perfumes” domains have the least mender systems, we choose these two error measures because of number of common users. The “books” domain has the maximum the proposed and baseline algorithm goals: they try to estimate and “perfumes” domain has the minimum number of items. Also, user ratings, instead of optimizing the recommendation rankings. the “books” domain is among the most sparse domains, while the Rank-based measures, such as precision, recall, and nDCG, would “perfumes” domain is the least sparse one. not be appropriate for and representative of these recommenders’ We run the proposed and baseline algorithms on each of these performance. domain pairs. Figures 1 and 2 show RMSE and MAE of algorithms For the single-domain algorithm, we use only the target do- on 12 domain pairs of Imhonet. The reported errorbars represent main dataset. However, for cross-domain algorithms, we have both a 95% confidence interval for errors. As we can see in these fig- source and target datasets. To be able to compare single and cross- ures, the use of cross-domain data with a competitive algorithm domain algorithms, we remove the same set of ratings for all of the originally designed for a single domain doesn’t really help: the algorithms. single-domain algorithm (SD-SVD) performs better than, or sim- ilar to, cross-domain baseline (CD-SVD) in many domains. Only Table 2: Correlation algorithms’ RMSE with each other. *: in “movie → book" and “game → movie" domain pairs, CD-SVD is significant with p-value < 0.01. significantly better than SD-SVD. The domains in these two pairs are semantically closer, compared to other domain pairs. However, CD-LCCA CD-SVD SD-SVD CD-LCCA performs significantly better than both CD-SVD and CD-LCCA 1 0.1993 -0.1909 SD-SVD in all of the domain pairs. Thus, CD-LCCA is able to see CD-SVD 0.1993 1 0.7416* beyond the semantic relationships between domains and capture SD-SVD -0.1909 0.7416* 1 their latent similarities that may not seem intuitive. Also, we can see that confidence intervals in most of the domain pairs (except for “game → perfume" and “perfume → book") are small. 3.3 Results To understand if average error of algorithms are related to each There are four domains in the dataset: books, movies, perfumes, other in different domain pairs, we look at RMSE correlations be- and games. This results in having 12 domain pairs to study. Some tween algorithms that are reported in Table 2. Here, we see that of the statistics of domain pairs are presented in Table 3. We can RMSE of CD-SVD and SD-SVD algorithms are highly correlated 11 Table 3: Domain and domain-pair data size statistics for the Imhonet dataset source target user size source item size target item size source density target density book game 41756 125688 11407 0.0007 0.0020 book movie 186877 155765 85892 0.0003 0.0014 book perfume 16750 105805 3545 0.0011 0.0037 game book 41756 11407 125688 0.0020 0.0007 game movie 49784 11715 75599 0.0019 0.0028 game perfume 6297 6854 3232 0.0030 0.0041 movie book 186877 85892 155765 0.0014 0.0003 movie game 49784 75599 11715 0.0028 0.0019 movie perfume 17882 63708 3565 0.0041 0.0037 perfume book 16750 3545 105805 0.0037 0.0011 perfume game 6297 3232 6854 0.0041 0.0030 perfume movie 17882 3565 63708 0.0037 0.0041 4 12 #10 0.38 CD-CCA CD-SVD SD-SVD 0.36 10 0.34 Average RMSE of Algorithms on all Domain Pairs 0.32 8 0.3 Number of users 6 0.28 0.26 4 0.24 0.22 2 0.2 0 0.18 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Target user profile size for Imhonet dataste Target User Profile Size Figure 3: Target profile sizes of users in Imhonet dataset Figure 4: User-based RMSE of algorithms in the Imhonet dataset, averaged on all domain-pairs and sorted based on the users’ target domain profile size with each other. However, CD-LCCA’s RMSE does not have any significant correlations with the two baseline algorithms’ perfor- 0.32 mance. 0.3 Altogether, we conclude that CD-LCCA is helpful in estimating Average MAE of Algorithms on all Domain Pairs 0.28 user preferences using auxiliary domain information in large-scale 0.26 datasets; the baseline cross-domain algorithm that is not designed CD-CCA 0.24 CD-SVD for this purpose (CD-SVD) may harm the recommendation results SD-SVD 0.22 rather than helping; error of baseline recommender algorithms are 0.2 correlated; and CD-LCCA can understand unintuitive, but useful, 0.18 similarities between domain pairs that are not discovered by CD- 0.16 SVD. 0.14 0 10 20 30 40 50 60 70 80 90 100 Target User Profile Size 4 DO LARGE-SCALE CROSS-DOMAIN ALGORITHMS ALLEVIATE COLD-START? Figure 5: User-based MAE of algorithms in the Imhonet One of the major problems in recommender systems literature is dataset, averaged on all domain-pairs and sorted based on the cold-start problem [16]. Cross-domain recommenders aim to the users’ target domain profile size alleviate this problem by transferring target user’s source profile information for recommendation in target domain. In CD-LCCA, this transfer happens by mapping source and target domains using profile size. Then, we calculate the error for each group of these canonical variates and correlations as in Equation 4. In this section, users in each of the algorithms. Figure 3 shows number of test we investigate the success of such transfer by comparing CD-LCCA, users vs. target domain profile sizes in all of the domain pairs. We CD-SVD, and SD-SVD’s performance in cold-start setting. To un- can see that most of the test users have a small profile size (less derstand how each of these algorithms perform in cold-start setting, than 10 items) in the target domain. There are a few users with we group test users of each dataset based on their target domain 100 and more items in their target profile. To have a better plot, 12 book togame book tomovie 1 0.34 0.32 0.8 0.3 0.6 0.28 0.26 0.4 0.24 0.22 0.2 0.2 0 0.18 0 20 40 60 80 100 0 20 40 60 80 100 book toperfume game tobook 1 0.5 CD-CCA 0.45 CD-SVD SD-SVD 0.8 0.4 0.6 0.35 0.3 0.4 0.25 0.2 0.2 0.15 0 0.1 0 20 40 60 80 100 0 20 40 60 80 100 game tomovie game toperfume 0.5 1 0.45 0.8 0.4 0.6 0.35 0.4 0.3 0.2 0.25 0.2 0 0.15 -0.2 0 20 40 60 80 100 0 20 40 60 80 100 movie tobook movie togame 0.34 0.9 CD-CCA 0.32 0.8 CD-SVD SD-SVD 0.3 0.7 0.28 0.6 0.26 0.5 0.24 0.4 0.22 0.3 0.2 0.2 0.18 0.1 0.16 0 0 20 40 60 80 100 0 20 40 60 80 100 Figure 6: User-based RMSE of algorithms in Imhonet dataset, averaged on each domain-pair and sorted based on the users’ target domain profile size we skipped showing these users. Also, there is a concave shape CD-SVD’s error. As the target domain profile size grows, the errors at small (less than 10) target domain profile sizes. This happens of two baseline algorithms have no significant differences. because Imhonet has asked some users to rate at least 20 items, for To have a better understanding of cold-start situation in each of providing recommendations to them. Since we only use 20% of test the domain pairs, we look at the results of domain-pair combina- user ratings in their target profiles, this increase in the profile size tions separately. Figures 6 and 7 show each algorithm’s cold-start happens for the profiles that have less than 10 items. RMSE and MAE in each of the domain pairs. Note that we have plot- Figures 4 and 5 show the RMSE and MAE of algorithms in the ted the errors for target profile sizes ranging from one to 100 items. cold-start setting based on target user profile size, averaged for all But, in some domain pairs (e.g., “game → “perfume"), maximum of the domain pairs. As we can see in these pictures, in average user profile size is less than 100 and thus the plot is discontinued. on all domain-pairs, CD-CCA performs significantly better than As we can see, for small profile sizes, in all domain pairs except both of the baselines. Also, the single-domain baseline (SD-SVD) in “game → perfume" CD-LCCA performs significantly better than average performs better than the cross-domain baseline (CD-SVD). baseline algorithms. This shows that CD-LCCA can successfully In smaller profile sizes SD-SVD’s error is significantly lower than transfer useful information from most source domains to target domain, especially in cold-start situation. For “book” and “movie” 13 book to game book to movie 0.9 0.28 0.8 0.26 0.7 0.24 0.6 0.5 0.22 0.4 0.2 0.3 0.18 0.2 0.16 0.1 0 0.14 0 20 40 60 80 100 120 0 20 40 60 80 100 120 book to perfume game to book 1 0.45 CD-CCA 0.4 CD-SVD SD-SVD 0.8 0.35 0.6 0.3 0.4 0.25 0.2 0.2 0.15 0 0.1 0 20 40 60 80 100 120 0 20 40 60 80 100 120 game to movie game to perfume 0.5 1 0.4 0.5 0.3 0 0.2 0.1 -0.5 0 50 100 0 50 100 movie to book movie to game 0.3 1 CD-CCA 0.25 0.8 CD-SVD SD-SVD 0.6 0.2 0.4 0.15 0.2 0.1 0 0 50 100 0 50 100 Figure 7: User-based MAE of algorithms in Imhonet dataset, averaged on each domain-pair and sorted based on the users’ target domain profile size target domains the superior performance of CD-LCCA continues movie", CD-SVD can be significantly better than SD-SVD especially in large profile sizes. But, in “game” and “perfume” target domains in larger profile sizes. Accordingly, in smaller target profile sizes performance difference of algorithms is insignificant after users not only CD-SVD does not help, but also it can harm recommen- have enough items in their target profile (between 25 and 45 items dation results. This shows that while CD-LCCA can efficiently use for different domain pairs). There are fewer users with larger profile the extra source domain information, CD-SVD cannot handle this sizes in these domains. Thus, we have lower confidence in algo- information effectively. rithms’ performance and wider confidence intervals, leading to Looking at error trends, for some domain pairs (e.g., “movie → insignificant differences. book" and “game → movie"), we see an initial error increase as Comparing CD-SVD and SD-SVD, we can see that they mostly the target profile size grows. Although we expect to see smaller have similar results. In all experiments with “movie" domain as errors, as we have more information from users in target domain, the source domain, SD-SVD performs significantly better than CD- the observed trend is against such expectation. This trend happens SVD from the beginning. But in “game → movie" and “perfume → in all algorithms including the single-domain baseline (SD-SVD). 14 Thus, such behavior cannot be attributed to using extra information International World Wide Web Conference Committee (IW3C2). in cross-domain algorithms. [5] Siamak Faridani. 2011. Using canonical correlation analysis for generalized sentiment analysis, product recommendation and search. In Proceedings of the Altogether, we can conclude that not only CD-LCCA can handle fifth ACM conference on Recommender systems (RecSys ’11). ACM, New York, NY, extra information from the semantically-related target domain effi- USA. [6] Ignacio Fernández-Tobías, Iván Cantador, Marius Kaminskas, and Francesco ciently, but it also can understand the relationship between source Ricci. 2012. Cross-domain recommender systems: A survey of the state of the and target domains that appear to be unrelated. art. In Spanish Conference on Information Retrieval. [7] Harold Hotelling. 1936. Relations Between Two Sets of Variates. Biometrika 28, 3/4 (1936). 5 PERFORMANCE ANALYSIS [8] Tomoharu Iwata and Koh Takeuchi. 2015. Cross-domain recommendation with- out shared users or items by sharing latent vector distributions. In Proceedings In CD-LCCA, calculating large-scale CCA costs O(N np(N 2 +kpc ) + of the Eighteenth International Conference on Artificial Intelligence and Statistics. Nnk 2 ), in which N is number of iterations for least squares; n is 379–387. number of data points (users); p is the number of items in the [9] Arto Klami, Guillaume Bouchard, Abhishek Tripathi, and others. 2014. Group- sparse Embeddings in Collective Matrix Factorization. In Proceedings of Interna- target domain; N 2 is the number of iterations to compute Yr using tional Conference on Learning Representations (ICLR) 2014. gradient descent; kpc is the number of top singular vectors used in [10] Bin Li. 2011. Cross-domain collaborative filtering: A brief survey. In Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on. IEEE, LING; and k is the number of components. The multiplications in 1085–1086. CD-LCCA depend on the number of nonzero elements in matrices. [11] Bin Li, Qiang Yang, and Xiangyang Xue. 2009. Transfer learning for collaborative In the worst case of multiplying dense matrices, the multiplications filtering via a rating-matrix generative model. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 617–624. cost O(npk + nk 2 ). Thus, as a whole, CD-LCCA costs O(Nnp(N 2 + [12] Babak Loni, Alan Said, Martha Larson, and Alan Hanjalic. 2014. ’Free kpc ) + N nk 2 + npk). Since kpc 6 k and k 6 p, CD-LCCA costs less lunch’enhancement for collaborative filtering with factorization machines. In than O(N np(N 2 + k)). Proceedings of the 8th ACM Conference on Recommender systems. ACM, 281–284. [13] Yucheng Low, Deepak Agarwal, and Alexander J Smola. 2011. Multiple do- In our experiments, we ran all of the algorithms on two similar main user personalization. In Proceedings of the 17th ACM SIGKDD international machines: a MacOS machine with 64GB RAM and two 4-core Intel conference on Knowledge discovery and data mining. ACM, 123–131. [14] Yichao Lu and Dean P Foster. 2014. large scale canonical correlation analysis Xeon, 2.26GHz CPUs and a Linux machine (CentOS) with 64GB with iterative least squares. In Advances in Neural Information Processing Systems. RAM and two 4-core Intel Xeon, 2.40GHz CPUs. On average, run- 91–99. ning CD-LCCA in Matlab on each domain pair took 21210 seconds [15] Weike Pan, Evan Wei Xiang, and Qiang Yang. 2012. Transfer Learning in Collab- orative Filtering with Uncertain Ratings.. In AAAI. (close to 6 hours), while running CD-SVD with GraphChi took [16] Denis Parra and Shaghayegh Sahebi. 2013. Recommender systems: Sources of almost 4 hours. knowledge and evaluation metrics. In Advanced Techniques in Web Intelligence-2. Springer, 149–175. [17] Shaghayegh Sahebi and Peter Brusilovsky. 2015. It Takes Two to Tango: An Explo- 6 CONCLUSIONS ration of Domain Pairs for Cross-Domain Collaborative Filtering. In Proceedings of the 9th ACM Conference on Recommender Systems. ACM, 131–138. This work presented a large-scale cross-domain collaborative filter- [18] Weiqing Wang, Zhenyu Chen, Jia Liu, Qi Qi, and Zhihong Zhao. 2012. User-based ing approach, CD-LCCA. Our experiments on a large-scale user- collaborative filtering on cross domain by tag transfer learning. In Proceedings of item rating dataset with 12 domain pairs showed that cross-domain the 1st International Workshop on Cross Domain Knowledge Discovery in Web and Social Network Mining. ACM, 10–17. collaborative filtering can be helpful even in large-scale target do- [19] Pinata Winoto and Tiffany Tang. 2008. If You Like the Devil Wears Prada the mains. We saw that CD-LCCA improves recommendation results in Book, Will You also Enjoy the Devil Wears Prada the Movie? A Study of Cross- Domain Recommendations. New Generation Computing 26 (2008). both hot and cold-start settings in all domain pairs. But, the baseline [20] Lei Wu, Wensheng Zhang, and Jue Wang. 2014. Fusion Hidden Markov Model cross-domain algorithm helped only in domain pairs with higher with Latent Dirichlet Allocation Model in Heterogeneous Domains. In Proceedings semantic similarities. In some cases, adding auxiliary information of International Conference on Internet Multimedia Computing and Service. ACM, 261. in the baseline cross-domain algorithm harmed the results. Thus, [21] Yu Zhang, Bin Cao, and Dit-Yan Yeung. 2010. Multi-domain collaborative fil- we concluded that CD-LCCA is able to capture unintuitive relation- tering. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial ships between different domains, that are not being understood Intelligence. AUAI Press, 725–732. [22] Lili Zhao, Sinnojialin Pan, Evanwei Xiang, Erheng Zhong, Zhongqi Lu, and by the baseline algorithms. Our cold-start analysis showed that Qiang Yang. 2013. Active transfer learning for cross-system recommendation. the proposed model is especially helpful in the cold-start setting. In Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013, Bellevue, Washington, USA. 1205. CD-LCCA focuses on domains with shared users. As a follow up to this work, we will expand CD-LCCA to perform cross-domain rec- ommendation in domains with partly-shared, and partly exclusive users. REFERENCES [1] Shlomo Berkovsky, Tsvi Kuflik, and Francesco Ricci. 2007. Cross-Domain Media- tion in Collaborative Filtering. In Proceedings of the 11th international conference on User Modeling (UM ’07). Springer-Verlag, Berlin, Heidelberg. [2] Shlomo Berkovsky, Tsvi Kuflik, and Francesco Ricci. 2008. Mediation of user models for enhanced personalization in recommender systems. User Modeling and User-Adapted Interaction 18, 3 (Aug. 2008). [3] Paolo Cremonesi and Massimo Quadrana. 2014. Cross-domain Recommendations Without Overlapping Data: Myth or Reality?. In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys ’14). ACM, New York, NY, USA, 297–300. [4] Ali Elkahky, Yang Song, and Xiaodong He. 2015. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems. In 15