Augmenting Collaborative Recommender by Fusing Explicit Social Relationships Quan Yuan, Shiwan Zhao Li Chen Yan Liu IBM China Research Department of Computer IBM T.J. Watson Research Laboratory Science, Hong Kong Baptist Center Beijing, 100193, China University Yorktown, NY 10598 {quanyuan, Hong Kong liuya@us.ibm.com zhaosw}@cn.ibm.com lichen@comp.hkbu.edu.hk Shengchao Ding Xiatian Zhang Wentao Zheng Institute of Computing IBM China Research IBM China Research Technology, Chinese Academy Laboratory Laboratory of Sciences Beijing, 100193, China Beijing, 100193, China Beijing, 100190, China xiatianz@cn.ibm.com zhengwt@cn.ibm.com dingshengchao@ict.ac.cn ABSTRACT Keywords Nowadays social websites have become a major trend in the Recommender Systems, Collaborative Filtering, Social Re- Web 2.0 environment, enabling abundant social data avail- lationship, Random Walk able. In this paper, we explore the role of two types of social relationships: membership and friendship, while being fused 1. INTRODUCTION with traditional CF (Collaborative Filtering) recommender In recent years, collaborative-filtering (CF) based recom- methods in order to more accurately predict users’ interests mender systems have been widely developed in order to ef- and produce recommendations to them. Through an ex- fectively support users’ decision-making process especially ploratory evaluation with real-life dataset from Last.fm, we when they are confronted with overwhelming information have revealed respective effects of the two explicit relation- (e.g. a large amount of product options that popularly ap- ships and furthermore their combinative impacts. In addi- pear in the current Web environment). There are two basic tion, the fusion is conducted via random walk graph model in entities considered by the recommender: the user and the comparison with via weighted neighborhood similarity ma- item. The user provides his rates on items (e.g. movies, trix, so as to identify the best performance platform. In- music, books, etc.) that he has experienced, based on which depth analysis on the experimental data particularly shows the system can connect him with persons who have similar the significant improvement by up to 8% on recommenda- interests and then recommend to him items that are pre- tion accuracy, by embedding social relationships in CF via ferred by these like-minded neighbors. In some cases which graph model. don’t have rating values available, the user’s interaction with items can also be considered. That is, if we can only get the information that a user watched a movie, the ”rating” Categories and Subject Descriptors is 1; otherwise it is 0. The recommendation method based H.5.3 [Group and Organization Interfaces]: [Collabora- on this binary rating matrix is also named as log-based CF tive Filtering, Computer-supported cooperative work, Eval- [24]. The traditional approaches can be hence regarded as uation/methodology]; H.3.3 [Information Storage and implicit ways to infer the social relationship between users. Retrieval]: [Information Filtering] However, it inevitably brings the limitation when few users rated or viewed few items (i.e. the sparsity problem), to make it hard to infer such preference relationship. With the increasing emergence of social network services, many General Terms websites support online user communities, such as Youtube, Algorithms, Experimentation, Human Factors Last.fm, del.icio.us, and e-commerce sites including Ama- zon.com and eBay.com. Community facilities are provided so that users can create and access to their community infor- mation and communicate with friends or members. For ex- ample, on Last.fm (a popular music recommender website), Permission to make digital or hard copies of all or part of this work for the user can establish friendship with others by ”finding peo- personal or classroom use is granted without fee provided that copies are ple” and/or join a group of users having similar music tastes not made or distributed for profit or commercial advantage and that copies (e.g. through ”finding groups”). bear this notice and the full citation on the first page. To copy otherwise, to We term this kind of community relationship as explicit republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. social relationship, since it is directly defined by users, rather Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00. than inferred by the system. As a matter of fact, two types Explicit social relationships propose a novel graph model to fuse the social rela- tionship with rating matrix, and adopt random walk Friendship Membership algorithm to produce neighborhood similarities for rec- ommendations. With a real-life dataset, we compared it with the weighted-similarity approach and identify Fusion the superior performance of graph fusion in improving recommendation accuracy. Standard collaborative filtering recommenders In the next section, we first provide a brief review of re- based on implicit relationship inferring lated work, and then propose a systematical approach to study the use of explicit social relations and incorporate them in collaborative recommenders via graph model in ad- Figure 1: Fusion of Friendship and Membership into dition to via weighted-similarity. Next, we show the ex- Standard CF periment design and results analysis, followed by the final section of conclusion and future work. of explicit relationships are commonly available on the cur- rent websites (as in the example of Last.fm): friendship 2. RELATED WORK and membership. For instance, users A and B are friends We review the related literature of fusing heterogeneous given that A adds B in his friend list (or conversely), but data sources with rating matrix to improve standard CF it does not refer that they must have similar music tastes. from two perspectives: Fusion of Social Relationship, and On the other hand, if A and B join in the same group, it Graph-based Recommender Algorithm. indicates that they have the membership relation and are likely with the similar interest (e.g. on ”Beatles” provided it 2.1 Fusion of Social Relationship is the group’s title). Since popular user-based and item-based CF algorithms It is believed that explicit social relationship can be likely that only rely on user-item rating matrix always suffer from applied to compensate the limitation of implicit relation- sparse and imbalance of rating data, researchers have started ship inference approach and improve the accuracy of recom- to incorporate other data sources to improve standard CF. mender systems [1]. Some researchers have recently been Balabanovic et al. [6] were among those who first investigate engaged in fusing such kind of information into traditional content-based systems that make use of the descriptive data CF methods [1, 15, 19]. However, to our knowledge, most about an item. Melville et al. [7] enhanced CF by using works purely concentrate on friendship data [4, 16]. Their content of a movie, e.g., movie genre. Pazzani [8] investi- tentative studies unfortunately show that the fusion some- gated hybrid methods using both of user data (demographic times does not work well due to the friendship’s inherent information) and item data (content) for improving recom- ambiguity as a relational descriptor [5]. Moreover, existing mendation accuracy. fusion approaches have been mainly based on the rating- Recently, with the increasing development of social web- matrix. It is in essence lack of exploration of other possible sites and appearance of social data, researchers have begun ways that may be potentially more effectively able to fuse to pay attention to the social data and explored its usage in the explicit social relationship with CF recommenders. As recommender systems. Konstas [21] adopted Random Walk the social data is inherently in a graph structure, fusion via with Restart to model the friendship and social annotation graph may be a good approach. (tagging) in a music track recommendation system. Gol- Given that the membership contains more information beck [11] used trust relationship in social network to improve implying the user’s preferences, such as his interest on the movie recommendations. [15] used social network data for music genre or singer as indicated by the group’s property neighborhood generation. In a Munich-based German com- and his like-minded people involved in the same group, we munity, friends are compared to neighbors of collaborative are interested in understanding whether this additional in- filtering for rating prediction. Their results showed that the formation could produce any practical benefits. Thus, in social friendship can benefit the traditional recommender this paper, our objective is to study whether and how to system. [19] proposed an online social recommender system best fuse both of membership and friendship, by means of attempting to use more social information for recommen- comparing their respective effects and potential combinative dation generation. The social data they introduced are the impacts via different fusion platforms. We believe that our friendship of users (from GeekBuddy), which was used to study will shed light on the role and applicability of social refine the description of each user. [10] proposed a factor information in boosting collaborative intelligence of current analysis approach based on probabilistic matrix factoriza- recommender systems. More specifically, our contributions tion to solve the data sparsity and poor prediction accuracy can be summarized as follows: problems, by employing both of users’ social network in- • We demonstrate the exact value of fusing explicit social formation and rating records. This work also concentrated relationships into recommender systems. Particularly, on using friendship to improve recommendations. However, in some cases, the neighborhood of users learnt from it has been shown that online friendship sometimes does membership has been found more accurate than from not work well due to its inherent ambiguity as a relational purely embracing friendship with traditional CF. descriptor [4, 16]. Compared to online friendship, online community membership contains more information about • We develop a framework to fuse and evaluate multiple users’ preferences. [2] used membership for recommending types of social relationships in a systematical approach online communities to members of the Orkut social network. through weighted-similarity calculation. Moreover, we However, their recommendations were on a per-community, Friendship Enhanced User Preference rather than on a per-user basis. I. Guy et al. [22] [23] built User-Item Friendship a people recommendation system named SONAR by lever- Similarity Similarity aging data from multiple channels including membership in project wiki. User 2.2 Graph-based Recommender Algorithm The computation of user/item similarity plays a key role Item User in user/item-based collaborative recommenders. Popular measurements of user similarity are Cosine similarity and Pearson’s correlation coefficient (see [9] for examples) based Friendship Matrix on the user-item rating matrix. The limitation is that they User + only use the local pairwise user information for neighbor- Group hood searching. Recent years, graph-based methods have been introduced to model relations between users and items from a global User perspective, and been used to seamlessly incorporate het- erogeneous data sources into the traditional user-item rating matrix. Huang proposed a two-level graph model for prod- Membership Matrix ucts [18], in which the two layers of nodes represent products and customers respectively, and three types of links between User-Item Membership nodes are: the product-product, the user-user, and the user- Similarity Similarity product link. The recommendation is generated based on Membership Enhanced User Preference the association strengths between a customer and products. Random walks on graph have been extensively discussed Figure 2: Fuse Friendship and Membership into Col- [12] and shown a rather good performance in the recom- laborative Recommender via Weighted-Similarity mendation area. M. Gori and A. Pucci proposed a random- walk based scoring algorithm, ItemRank [14], which can be used to rank products according to expected user prefer- mendation results. Our fusion framework aims at leverag- ences, so as to recommend top ranked items to potentially ing the two social relationships to strengthen the user sim- interested users. Similarly, Baluja et al. [3] made video rec- ilarity calculation process by two means: one is combining ommendations for YouTube through random walk on the the user-similarity from friendship and/or membership with view graph, which is a bipartite graph containing users and similarity from rating matrix in a weighted approach; the videos where links are visiting logs of users on videos. F. other is modeling the social relations and rating matrix all Fouss et al. [13] presented a new perspective on charac- in a graph, and then applying random walk on this graph to terizing the similarity between elements of a database or, compute the user similarity. more generally, nodes of a weighted and undirected graph. This similarity called L+ , the pseudoinverse of the Lapla- 3.1 Fusing via weighted-similarity cian matrix of the graph. Their experimental results on the In the rating matrix, we can view the preferences of users MovieLens database showed that the Laplacian-based simi- as feature vectors. Every user vector consists of n feature larity computation performed well in comparison with other slots, one for each available item. The values used to fill methods. those slots can be either the rating rak that a user ua pro- However, the limitation of related work in the ”fusion of vides to the corresponding item ik , or 0 if no such rating social relationship” (the first subsection) is that few have exists. Now, we can compute the proximity between two considered the potential positive role of membership. More- users ua and ub , by calculating the similarity between their over, in the related work on ”graph-based recommender al- vectors. For example, we can use the Cosine similarity for gorithm”, no work on its performance as a fusion platform this calculation as follows: has been conducted. In this paper, we therefore aim at in- Rua · Rub vestigating how to effectively fuse both of friendship and Similarity(ua , ub ) = (1) membership into CF algorithm via random walk approach ||Rua ||||Rub || in order to improve the performance and solve the sparsity where Rua and Rub are two vectors of ratings from users ua problem. To the best of our knowledge, our work is one of and ub respectively. the first attempts to use both friendship and membership to According to the user similarity calculated from the rat- enhance recommender systems. ing matrix, we give a detailed description of fusing social relationships via weighted user similarity based on Fig 2. When fusing friendship with user similarity from rating 3. FUSING SOCIAL RELATIONSHIPS INTO matrix, we need to get a user similarity based on the friend- RECOMMENDERS ship firstly. We represent the friendship in the form of a In this section, we mainly take into account of two types user-user matrix. If two users ui and uj are friends, then the of explicit social relationships: friendship and membership, value of cell uij is set to 1, otherwise 0. Based on this user- and propose a generic framework to fuse them with the user- user matrix, we calculate a friendship similarity by adopting item rating matrix. Given that user-user similarity com- Cosine Correlation, named Simf ri . Next, when calculating putation is crucial to collaborative recommenders, a more the final user similarity between ua and ub , we combine the accurate user-user similarity always leads to better recom- Simf ri with Simui (user similarity calculated from user-item matrix) in a weighted approach as follows: been motivated to further use graph-based random walk al- gorithm for modeling these social data (i.e. friendship and Simui+f ri (ua , ub ) = λSimui (ua , ub ) membership). We think that the transitivity of the graph (2) +(1 − λ)Simf ri (ua , ub ) will improve the computation of the similarity between two users. The parameter λ is used to adjust the weight of Simf ri and Simui , the bigger the λ is, the rating matrix plays a more important role in the combined similarity. Finally, we use 3.2.1 Graph Construction for Social Community this combined similarity Simui+f ri in finding neighbors for The first key issue is to construct a meaningful graph so each user. that the resulting similarity can truly reflects the prefer- When fusing the membership, firstly we also need to get ence similarity between users. Considering a graph G = a user-user similarity based on the membership data. Since (V, E, W ), where V is the set of nodes (users, items or com- membership is the relationship between the user and the munities, etc.), E is the set of edges which represent rela- communities/groups he/she joined, a new type of entity was tionships between all types of nodes, and W is the set of introduced besides the two entities (user and item). We weights for all edges. concretely represent the membership in the form of a user- The relationship data in a social community can be in- group matrix, where the rows indicate users and the columns teractive relationship, such that user watched a movie or indicate the groups joined by users. If a user ui joins group listened to a song; or be social relationship like membership gj , the value of cell uij is set to 1, otherwise 0. Based on this or friendship. Let us use Last.fm which contains all types user-group matrix, we can get a membership similarity by of these data for example. It can be modeled by a graph using Cosine Correlation too, named Simmem . As for the G in the following way: there are three types of nodes in generation of final user-user similarity, a weighted formula Last.fm, the user, artist (item) and group, and each ele- is applied where λ plays the same role as before. ment of the user, artist and group corresponds to a node of the graph; and the interactive relationship like a user lis- Simui+mem (ua , ub ) = λSimui (ua , ub ) tens to an artist, social relationship like user is a member of (3) +(1 − λ)Simmem (ua , ub ) a group, and user is a friend of another user is expressed as an edge. When a random walker walks on the graph, the dif- Furthermore, we are interested in seeing what will happen ference of node types were ignored, and we only care about if two types of social relations are fused together with the how many short paths existed between two nodes which have rating matrix. In this condition, we first calculate the user- directly impact on their similarity computation, so special user similarity Simf ri from friendship and Simmem from treatment is not needed for any type of nodes in our graph. membership independently, and then introduce two param- The weight wij of the edge connecting node i and node eters: λ and β to adjust the weights of three data sources as j should have a meaningful value. Traditionally, we usually shown in the equation 4. deal with interactive data in the following convention: the Simui+f ri+mem (ua , ub ) = λSimui (ua , ub ) + (1 − λ) more frequent the interaction between node i and node j, (4) the larger the value of wij , and consequently, the easier the (βSimmem (ua , ub ) + (1 − β)Simf ri (ua , ub )) communication through the edge. However, in the case of At the first level, λ is used to adjust the weight between rat- social community, besides interactive data we also have so- ing matrix and the other two social relationships; and then cial relationship, and they usually are not associated with β is used to adjust the remaining weight between friend- a frequent number, e.g. most users join a group only once, ship and membership. The bigger the λ is, the rating ma- and so is the same for adding friends. For the consistency trix plays a more important role; the bigger the β is, the of the two types of edges and simplicity, we set the weight membership plays a more dominant role in the combined of member of edge and friend of edge to be 1, and treat user-user similarity. listens to edge as follows: if a user listened to the songs of After the computation of user-user similarity for finding an artist more than 3 times, then the edge connecting the neighbors, the next step is to recommend items to users by user and the artist was weighted 1. We require the weights predicting each item’s ratings. The predicted rating ri,m of to be both positive wij > 0 and symmetric wij = wji , so a test item m for the user i is hence computed as: the graph we built is an undirected graph. PN When handling friendship between users, there are several j=1 sim(ui , uj ) ∗ rj,m approaches to transforming the pairwise similarity between ri,m = (5) PN nodes of the same type (e.g. users) into a graph [17], such j=1 sim(ui , uj ) as the -neighborhood graph, k-nearest neighbor graph, and where rj,m is the rating of user uj on the item im , and the fully connected graph. We choose the -neighborhood sim(ui , uj ) is the similarity between the current user ui and graph to fuse the friendship on the graph, not only because the neighbor uj . In fusing via weighted-similarity approach, it has computational advantage by using a sparse represen- sim(ui , uj ) can be similarity measures in equation 2, 3, and tation of the data, but also because it can filter out the 4, depends on the fusing strategy and data used each time. noisy data so that we can create a concrete friend set for each user. When constructing the -neighborhood graph, 3.2 Fusing via Graph we connect user-node pairs whose friendship similarities are In the “fusing via weighted-similarity” method, we cal- greater than . Given that the range of friendship similarity culated the user-user similarity based on the static “local” is [0,1], the range of  is also [0,1]. We enumerate  in this pairwise user information. As we know, social network is range with step 0.01, and finally we get the optimal . inherently in a graph structure with the transitivity charac- When the graph was built, for the corresponding sym- teristics as a key feature of social relations, therefore we have metric adjacency matrix A of graph G, the element aij was defined as: aij = wij if node i is connected to node j and aij 4. EXPERIMENTS = 0 otherwise. Thus, people who listen to the same artists and join same groups, will be connected by a comparatively 4.1 Data Sets larger number of short paths. Traditional data sets used in the evaluation of collabo- 3.2.2 Random Walk and Similarity Measures rative filtering systems, such as MovieLens, do not include explicit social relationship, while in Last.fm, a popular so- Random walk is a mathematical formalization of a trajec- cial music site, community information is available so that tory that consists of taking successive random steps. At each an entity-relation model can be generated which includes step, the next node in the walk is selected randomly from the relationship between users and items. the neighbors of the last node in the walk. The sequence For our purpose, we extracted two typical social relation- of visited nodes is a Markov Chain [20], with the transition ships: the friendship between users and the membership probability: which describes the user’s participation in groups, by ac- cessing its Web Service APIs 1 . Besides, we think it is more ( 1 d(i) , if (i, j) ∈ E pij = (6) meaningful to recommend artists instead of individual mu- 0 , otherwise sic since music is variable while preference on artists is more where d(i) is the degree of node i. constant, so we use artist as the ”item” in our recommenda- From the viewpoint of collaborative recommender sys- tions. A user and an item is linked if the user listened to tems, finding accurate neighbor set for each user is the cor- song(s) of the artist, and a user and a community is linked nerstone, so a good similarity measure on the graph is a if the user joined the group. The relationship between an crucial step. artist and a community is formed if the artist’s songs were Fouss et al. [13] has discussed several approaches to com- frequently listened by users in this group. There are also puting similarities between nodes of graph and their applica- links among users describing their friendship. tion to collaborative recommendations, that mainly included We concretely established an active data set consisting of distance-based measure like ECTD, and inner-product based 943 users, 1, 001 artists and 676 groups. There are 36, 424 measure like L+ . records in the user-artist matrix which sparsity degree (per- ECTD is the abbreviation for Euclidean Commute-Time centage of zero values in the matrix) is 96.14%, and 7, 038 Distance. Average commute-time is the average number of records in the user-group matrix which sparsity is 98.89%. steps that a random walker, starting in node i (i 6= j), will The total number of friendship of 943 selected users is 33776, take to enter node j for the first time and go back to i, which means each user on average has 35.8 friends. Please represented as n(i, j). Average commute-time is symmetric note that the rating matrix here is the user-artist matrix, by definition, and it is a distance measure. n(i, j)1/2 is also and if a user listened to song(s) of an artist, there is ”1” in a distance in Euclidean space, and it is named as Euclidean the corresponding cell. Commute-Time Distance. By means of 5 fold cross-validation 2 , each row (represent L+ is the Moore-Penrose pseudoinverse of the Laplacian a user) of the user-artist matrix is randomly split into five matrix L. The Laplacian matrix L of the graph is defined different sets. For each time of experiment, four-fifths of of as, L = the data is included in the training set and the other is used PnD - A, where D = Diag(ai. ), with dii = [D]ii = as the testing data. ai. = j=1 aij , and A is the adjacent matrix of graph G. Let e be a column vector with 1(i.e., e = [1, 1, . . . , 1]T , where T denotes the matrix transpose), then L+ can be computed 4.2 Evaluation Metrics with the formula: We adopted standard metrics in the area of information retrieval to evaluate our recommenders. During each round L+ = (L − eeT /n)−1 + eeT /n, (7) of cross-validation, we recommend and rank a set of poten- where n is the number of nodes. If we define ei as the tial artists for each user. We then compare the predicted rec- ith column of I, ei = [0, . . . , 0, 1, 0, . . . , 0]T (1 is in the ith ommendation list with true preferences on artists in the test column), then we can explain the relation between average set, and compute precision, recall, and F-measure scores. commute-time and L+ in the form 1. Recall. The score measures the average (on all users) n(i, j) = VG (ei − ej )T L+ (ei − ej ), (8) of the proportion (in percentages) of artists from the where each node i is represented by a unit vector ei in the testing sets that appear among the top n ranked list node space spanned by {ei }. It can be proved that the node from the training sets, for some given n. It should be as vector ei can be transformed into a new Euclidean space, and high as possible for good performance. We computed the elements of L+ (lij ) are the inner products between these the recall from Top-1 to Top-20 artists(for a total of transformed node vectors. Therefore L+ is a kernel matrix(a 1001 artists). Gram matrix) and can be used as a similarity matrix for the nodes. 2. Precision. This metric measures the proportion of According to Fouss’ study, the inner-product based sim- recommended items that are ground truth items. Note ilarity measure L+ provides better and more stable results that the items in the profiles of the testing data rep- for collaborative recommenders, so we adopt L+ metric to resent only a fraction of the items that the user truly measure the similarity between users, which means in equa- accessed. tion 5, sim(ui , uj ) equals to lij in this case. 1 The web page is http://www.last.fm/api 2 The number of fold is the number of tested sets. 3. F-measure. F-measure is the weighted harmonic mean Fusing friendship on User-Item Matrix of precision and recall. The equation is as follows: 15 friendship fusion 2 ∗ precision ∗ recall 14 F = (9) (precision + recall) 13 F-Measure(%) In the following, we report the results of recommending 12 the top 1, 2, 5, 10 and 20 artists. At each pass, 50 users are taken as neighbors based on different similarity measures for 11 recommendation. 10 4.3 Experiment Design and Results At first, we evaluated the fusion approach via weighted- 9 similarity. A user-based collaborative filtering recommender 8 was first run on the user-artist rating matrix, resulting in the 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 baseline represented by CFU I (see Tables 1 to 3 respectively Lambda showing precision, recall and F-measure scores). Figure 3: F-measure changes as lambda change in Table 1: Precision of Fusion via Weighted-Similarity friendship fusion Approaches Precision Top 1 Top 2 Top 5 Top 10 Top 20 CFU I 29.93 25.12 19.09 15.23 11.33 Fusing member on User-Item Matrix W Sf ri+U I 29.67 25.33 19.64 15.51 11.48 15 membership fusion W Smem+U I 29.44 25.49 19.58 15.33 11.40 14 W Sf ri+mem+U I 29.37 25.45 19.58 15.34 11.42 13 F-Measure(%) 12 Table 2: Recall of Fusion via Weighted-Similarity 11 Approaches Recall Top 1 Top 2 Top 5 Top 10 Top 20 10 CFU I 3.87 6.50 12.36 19.72 29.34 W Sf ri+U I 3.84 6.56 12.71 20.07 29.73 9 W Smem+U I 3.81 6.60 12.67 19.84 29.52 W Sf ri+mem+U I 3.80 6.59 12.68 19.86 29.58 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Lambda Table 3: F-measure of Fusion via Weighted- Figure 4: F-measure changes as lambda change in Similarity Approaches membership fusion F-measure Top 1 Top 2 Top 5 Top 10 Top 20 CFU I 6.85 10.33 15.01 17.19 16.35 W Sf ri+U I 6.80 10.42 15.43 17.50 16.56 Next, we fused membership with rating matrix and also W Smem+U I 6.75 10.49 15.38 17.29 16.45 used the parameter λ to adjust the weight of rating matrix W Sf ri+mem+U I 6.73 10.47 15.39 17.31 16.48 and membership in the similarity calculation. From the figure 4, it can be seen that when λ = 0.8, we Then, we tried to fuse friendship with the rating matrix can get best results overall. The third rows of Tables 1 to and used λ to adjust the weight of rating matrix and friend- 3, represented by W Smem+U I , respectively show precision, ship while computing user similarity. Figure 3 shows how recall and F-measure values when λ = 0.8. It shows that F-measure changes with the changing of λ. It achieves the the results slightly improve on the baseline, but are a little peak when λ = 0.7, which means that the rating matrix weaker than the fusion of friendship on the rating matrix. contributes to 70% percent of the weight while friendship We further fused two relationships together on the rat- contributes to 30% in calculating the user-user similarity. ing matrix and adopted two parameters: λ and β to adjust Specifically, results in the second rows of Tables 1 to 3, the weights for the three data sources as shown in Formula represented by W Sf ri+U I , give the precision, recall and F- 4. Experimental results indicate that the hybrid fusion per- measure scores when λ = 0.7. It can be seen that it is forms best when lambda = 0.8 and beta = 0.5 (which means better than the baseline, especially when returning top 5 rating matrix contributes 80%, friendship and membership recommendations (the improvement of F-measure achieved respectively contributes 10% and 20% in the user-user sim- up to 2.79%). In the process of tuning λ and β in the fol- ilarity calculation). These results are illustrated in the last lowing, we considered the average score of Top-1 to Top-20 rows of Tables 1 to 3. To our surprise, compared to fusion of recommendations. friendship and membership separately, it did not have dis- Weighted-Similarity Fusion: Rating + Friendship 12 Weighted-Similarity Fusion: Rating + Membership Table 5: Table of improvements on F-Measure by Percentage of F-measure improvement Weighted-Similarity Fusion: Rating+Friendship+Membership Graph Fusion: Rating + Friendship + Membership Comparing with the Baseline 10 Improvements Top 1 Top 2 Top 5 Top 10 Top 20 8 W Sf ri+U I -0.73 0.87 2.80 1.80 1.28 6 W Smem+U I -1.46 1.55 2.47 0.58 0.61 W Sf ri+mem+U I -1.75 1.36 2.53 0.70 0.80 4 Gf ri+mem+U I 4.82 7.94 7.99 4.25 4.04 2 0 menting recommendations, and particularly that the graph- -2 based fusion is more effective in bringing into play of the 1 2 5 10 20 power of social data. To the best of our knowledge, the Size of Top-N recommendations work is one of the first attempts to explore the effect of membership in addition to friendship, and to fuse both of Figure 5: Improvements on F-Measure of All the them based on random walk graph model with collaborative Fusion Approaches filtering (CF) systems. For the next step, we are interested in further exploring the impact of social relationships on recommender systems tinct difference and was actually even a little weaker than from three aspects: one is to explore other potential re- the pure fusing of friendship. lationships, such as the relation between items and asso- In order to identify whether the fusion via graph model ciated groups, other social relationships besides friendship would perform better than via weighted similarity approach and membership, such as the reporting chain in a company, given that social data are inherently in the graph structure, to see how to model and utilize these data in order to make we finally did the experiment of fusing the two social rela- better recommendations; another direction is to explore how tionships on the graph and applied random walk algorithm to enhance Random Walk model so as to handle heteroge- to calculate neighborhood similarity. neous data in a more fine-grained way, based on the method Based on the graph construction method described before, proposed in [25]; finally, as the explosion of the size of social we firstly run a series of simulations to learn the optimal . websites, we need to pay more attention to the algorithm’s After running 100 times, the optimal  that we got is 0.05. scalability and efficiency, when the social graph grows with The comparative results under threshold 0.05 are then com- millions of nodes. puted and listed in Table 5. It shows that precision, recall and F-measure scores are all highly improved compared to the fusion via weighted-similarity approach. In particular, 6. ACKNOWLEDGMENTS when returning top 2 and top 5 recommendations, the im- We thank Lawrence Bergman of IBM T.J. Watson Re- provements significantly reached 7.94% and 7.99% respec- search Center for his generous help and valuable comments tively. on this work. Table 4: Fusing friendship and membership via 7. REFERENCES Graph [1] C. Lam. Snack: incorporating social network Gf ri+mem+U I Top 1 Top 2 Top 5 Top 10 Top 20 information in automated collaborative filtering. In Precision 31.30 27.12 20.62 15.88 11.79 EC ’04: Proceedings of the 5th ACM conference on Recall 4.05 7.02 13.35 20.55 30.54 Electronic commerce, pages 254–255, New York, NY, F-Measure 7.18 11.15 16.21 17.92 17.01 USA, 2004. ACM. [2] E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in Figure 5 further shows that while returning Top-1 recom- the orkut social network. In KDD ’05: Proceedings of mendation, the fusion via graph can achieve an improvement the eleventh ACM SIGKDD international conference at 4.82%, and others are however all slightly weaker than on Knowledge discovery in data mining, pages the baseline. When returning top N recommendations from 678–684, New York, NY, USA, 2005. ACM. 2 to 10, all of the fusion approaches enhance the baseline [3] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, CF method, which positively proves the usefulness of social S. Kumar, D. Ravichandran, and M. Aly. Video relationship data especially when multiple recommendations suggestion and discovery for youtube: taking random are computed. It also indicates that the fusion via graph can walks through the view graph. In J. Huai, R. Chen, boost the baseline significantly by up to 8%, demonstrating H.-W. Hon, Y. Liu, W.-Y. Ma, A. Tomkins, and that the graph model is a more proper way to fuse explicit X. Zhang, editors, WWW, pages 895–904. ACM, 2008. social relationships. [4] D. Boyd. Friends, friendsters, and myspace top 8: Writing community into being on social network sites. 5. CONCLUSION AND FUTURE WORK http://www.firstmonday.org/issues/issue11 12/boyd/ In this paper, we presented two principal methods to inte- index.html, December 2006. grate explicit social relationships into traditional CF meth- [5] N. Baym. How Good A Friend Is A Last.fm Friend? ods: the weighted-similarity fusion and the graph fusion. We http://www.last.fm/user/popgurl/journal/2008/04/28/ demonstrated the effectiveness of social relationships in aug- 3d9l how good a friend is a last.fm friend. [6] M. Balabanović and Y. Shoham. Fab: content-based, [21] I. Konstas, V. Stathopoulos, and J. M. Jose. On social collaborative recommendation. Commun. ACM, networks and collaborative recommendation. In SIGIR 40(3):66–72, 1997. ’09: Proceedings of the 32nd international ACM [7] P. Melville, R. J. Mooney, and R. Nagarajan. SIGIR conference on Research and development in Content-boosted collaborative filtering for improved information retrieval, pages 195–202, New York, NY, recommendations. In Eighteenth national conference USA, 2009. ACM. on Artificial intelligence, pages 187–192, Menlo Park, [22] I. Guy, I. Ronen, and E. Wilcox. Do you know?: CA, USA, 2002. American Association for Artificial recommending people to invite into your social Intelligence. network. In IUI ’09: Proceedings of the 13th [8] M. J. Pazzani. A framework for collaborative, international conference on Intelligent user interfaces, content-based and demographic filtering. Artif. Intell. pages 77–86, New York, NY, USA, 2009. ACM. Rev., 13(5-6):393–408, 1999. [23] J. Chen, W. Geyer, C. Dugan, M. Muller, and I. Guy. [9] J. S. Breese, D. Heckerman, and C. M. Kadie. Make new friends, but keep the old: recommending Empirical analysis of predictive algorithms for people on social networking sites. In CHI ’09: collaborative filtering. In G. F. Cooper and S. Moral, Proceedings of the 27th international conference on editors, Proceedings of the 14th Conference on Human factors in computing systems, pages 201–210, Uncertainty in Artificial Intelligence, pages 43–52, New York, NY, USA, 2009. ACM. 1998. [24] J. Wang, A. P. de Vries, and M. J. T. Reinders. A [10] H. Ma, H. Yang, M. R. Lyu, and I. King. Sorec: social user-item relevance model for log-based collaborative recommendation using probabilistic matrix filtering. In M. Lalmas, A. MacFarlane, S. M. Rüger, factorization. In CIKM ’08: Proceeding of the 17th A. Tombros, T. Tsikrika, and A. Yavlinsky, editors, ACM conference on Information and knowledge ECIR, volume 3936 of Lecture Notes in Computer management, pages 931–940, New York, NY, USA, Science, pages 37–48. Springer, 2006. 2008. ACM. [25] J. Zhang, J. Tang, B. Liang, Z. Yang, S. Wang, J. Zuo, [11] J. Golbeck. Generating predictive movie and J. Li. Recommendation over a heterogeneous recommendations from trust in social networks. pages social network. Web-Age Information Management, 93–104. 2006. International Conference on, 0:309–316, 2008. [12] P. G. Doyle and J. L. Snell. Random walks and electric networks, 2000. [13] F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. Knowledge and Data Engineering, IEEE Transactions on, 19(3):355–369, March 2007. [14] M. Gori and A. Pucci. Itemrank: A random-walk based scoring algorithm for recommender engines. In M. M. Veloso, editor, IJCAI, pages 2766–2771, 2007. [15] G. Groh and C. Ehmig. Recommendations in taste related domains: collaborative filtering vs. social filtering. In GROUP ’07: Proceedings of the 2007 international ACM conference on Supporting group work, pages 127–136, New York, NY, USA, 2007. ACM. [16] R. Gross, A. Acquisti, and J. H. Heinz. Information revelation and privacy in online social networks. In WPES ’05: Proceedings of the 2005 ACM workshop on Privacy in the electronic society, pages 71–80, New York, NY, USA, 2005. ACM Press. [17] U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, December 2007. [18] Z. Huang. Graph-based analysis for e-commerce recommendation. PhD thesis, Tucson, AZ, USA, 2005. Adviser-Hsinchun Chen and Adviser-Daniel D. Zeng. [19] H. G. Hummel, B. van den Berg, A. J. Berlanga, H. Drachsler, J. Janssen, R. Nadolski, and R. Koper. Combining social-based and information-based approaches for personalised recommendation on sequencing learning activities. International Journal of Learning Technology, 3:152–168(17), 12 August 2007. [20] J. G. Kemeny and J. L. Snell. Finite Markov Chains. Springer-Verlag, 1976.