Augmenting Collaborative Recommender by Fusing
                    Explicit Social Relationships

            Quan Yuan, Shiwan Zhao                                             Li Chen                              Yan Liu
                   IBM China Research                            Department of Computer                  IBM T.J. Watson Research
                        Laboratory                              Science, Hong Kong Baptist                         Center
                  Beijing, 100193, China                                 University                         Yorktown, NY 10598
                    {quanyuan,                                          Hong Kong                           liuya@us.ibm.com
               zhaosw}@cn.ibm.com                             lichen@comp.hkbu.edu.hk
                    Shengchao Ding                                       Xiatian Zhang                         Wentao Zheng
                Institute of Computing                               IBM China Research                      IBM China Research
            Technology, Chinese Academy                                   Laboratory                              Laboratory
                      of Sciences                                   Beijing, 100193, China                  Beijing, 100193, China
               Beijing, 100190, China                             xiatianz@cn.ibm.com                    zhengwt@cn.ibm.com
            dingshengchao@ict.ac.cn

ABSTRACT                                                                             Keywords
Nowadays social websites have become a major trend in the                            Recommender Systems, Collaborative Filtering, Social Re-
Web 2.0 environment, enabling abundant social data avail-                            lationship, Random Walk
able. In this paper, we explore the role of two types of social
relationships: membership and friendship, while being fused                          1.   INTRODUCTION
with traditional CF (Collaborative Filtering) recommender                               In recent years, collaborative-filtering (CF) based recom-
methods in order to more accurately predict users’ interests                         mender systems have been widely developed in order to ef-
and produce recommendations to them. Through an ex-                                  fectively support users’ decision-making process especially
ploratory evaluation with real-life dataset from Last.fm, we                         when they are confronted with overwhelming information
have revealed respective effects of the two explicit relation-                       (e.g. a large amount of product options that popularly ap-
ships and furthermore their combinative impacts. In addi-                            pear in the current Web environment). There are two basic
tion, the fusion is conducted via random walk graph model in                         entities considered by the recommender: the user and the
comparison with via weighted neighborhood similarity ma-                             item. The user provides his rates on items (e.g. movies,
trix, so as to identify the best performance platform. In-                           music, books, etc.) that he has experienced, based on which
depth analysis on the experimental data particularly shows                           the system can connect him with persons who have similar
the significant improvement by up to 8% on recommenda-                               interests and then recommend to him items that are pre-
tion accuracy, by embedding social relationships in CF via                           ferred by these like-minded neighbors. In some cases which
graph model.                                                                         don’t have rating values available, the user’s interaction with
                                                                                     items can also be considered. That is, if we can only get
                                                                                     the information that a user watched a movie, the ”rating”
Categories and Subject Descriptors                                                   is 1; otherwise it is 0. The recommendation method based
H.5.3 [Group and Organization Interfaces]: [Collabora-                               on this binary rating matrix is also named as log-based CF
tive Filtering, Computer-supported cooperative work, Eval-                           [24]. The traditional approaches can be hence regarded as
uation/methodology]; H.3.3 [Information Storage and                                  implicit ways to infer the social relationship between users.
Retrieval]: [Information Filtering]                                                  However, it inevitably brings the limitation when few users
                                                                                     rated or viewed few items (i.e. the sparsity problem), to
                                                                                     make it hard to infer such preference relationship. With
                                                                                     the increasing emergence of social network services, many
General Terms                                                                        websites support online user communities, such as Youtube,
Algorithms, Experimentation, Human Factors                                           Last.fm, del.icio.us, and e-commerce sites including Ama-
                                                                                     zon.com and eBay.com. Community facilities are provided
                                                                                     so that users can create and access to their community infor-
                                                                                     mation and communicate with friends or members. For ex-
                                                                                     ample, on Last.fm (a popular music recommender website),
Permission to make digital or hard copies of all or part of this work for            the user can establish friendship with others by ”finding peo-
personal or classroom use is granted without fee provided that copies are            ple” and/or join a group of users having similar music tastes
not made or distributed for profit or commercial advantage and that copies           (e.g. through ”finding groups”).
bear this notice and the full citation on the first page. To copy otherwise, to         We term this kind of community relationship as explicit
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.                                                             social relationship, since it is directly defined by users, rather
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.                                     than inferred by the system. As a matter of fact, two types
                    Explicit social relationships                         propose a novel graph model to fuse the social rela-
                                                                          tionship with rating matrix, and adopt random walk
             Friendship                     Membership                    algorithm to produce neighborhood similarities for rec-
                                                                          ommendations. With a real-life dataset, we compared
                                                                          it with the weighted-similarity approach and identify
                               Fusion                                     the superior performance of graph fusion in improving
                                                                          recommendation accuracy.
           Standard collaborative filtering recommenders
                                                                       In the next section, we first provide a brief review of re-
              based on implicit relationship inferring              lated work, and then propose a systematical approach to
                                                                    study the use of explicit social relations and incorporate
                                                                    them in collaborative recommenders via graph model in ad-
Figure 1: Fusion of Friendship and Membership into
                                                                    dition to via weighted-similarity. Next, we show the ex-
Standard CF
                                                                    periment design and results analysis, followed by the final
                                                                    section of conclusion and future work.
of explicit relationships are commonly available on the cur-
rent websites (as in the example of Last.fm): friendship            2.    RELATED WORK
and membership. For instance, users A and B are friends
                                                                       We review the related literature of fusing heterogeneous
given that A adds B in his friend list (or conversely), but
                                                                    data sources with rating matrix to improve standard CF
it does not refer that they must have similar music tastes.
                                                                    from two perspectives: Fusion of Social Relationship, and
On the other hand, if A and B join in the same group, it
                                                                    Graph-based Recommender Algorithm.
indicates that they have the membership relation and are
likely with the similar interest (e.g. on ”Beatles” provided it     2.1    Fusion of Social Relationship
is the group’s title).
                                                                       Since popular user-based and item-based CF algorithms
   It is believed that explicit social relationship can be likely
                                                                    that only rely on user-item rating matrix always suffer from
applied to compensate the limitation of implicit relation-
                                                                    sparse and imbalance of rating data, researchers have started
ship inference approach and improve the accuracy of recom-
                                                                    to incorporate other data sources to improve standard CF.
mender systems [1]. Some researchers have recently been
                                                                    Balabanovic et al. [6] were among those who first investigate
engaged in fusing such kind of information into traditional
                                                                    content-based systems that make use of the descriptive data
CF methods [1, 15, 19]. However, to our knowledge, most
                                                                    about an item. Melville et al. [7] enhanced CF by using
works purely concentrate on friendship data [4, 16]. Their
                                                                    content of a movie, e.g., movie genre. Pazzani [8] investi-
tentative studies unfortunately show that the fusion some-
                                                                    gated hybrid methods using both of user data (demographic
times does not work well due to the friendship’s inherent
                                                                    information) and item data (content) for improving recom-
ambiguity as a relational descriptor [5]. Moreover, existing
                                                                    mendation accuracy.
fusion approaches have been mainly based on the rating-
                                                                       Recently, with the increasing development of social web-
matrix. It is in essence lack of exploration of other possible
                                                                    sites and appearance of social data, researchers have begun
ways that may be potentially more effectively able to fuse
                                                                    to pay attention to the social data and explored its usage in
the explicit social relationship with CF recommenders. As
                                                                    recommender systems. Konstas [21] adopted Random Walk
the social data is inherently in a graph structure, fusion via
                                                                    with Restart to model the friendship and social annotation
graph may be a good approach.
                                                                    (tagging) in a music track recommendation system. Gol-
   Given that the membership contains more information
                                                                    beck [11] used trust relationship in social network to improve
implying the user’s preferences, such as his interest on the
                                                                    movie recommendations. [15] used social network data for
music genre or singer as indicated by the group’s property
                                                                    neighborhood generation. In a Munich-based German com-
and his like-minded people involved in the same group, we
                                                                    munity, friends are compared to neighbors of collaborative
are interested in understanding whether this additional in-
                                                                    filtering for rating prediction. Their results showed that the
formation could produce any practical benefits. Thus, in
                                                                    social friendship can benefit the traditional recommender
this paper, our objective is to study whether and how to
                                                                    system. [19] proposed an online social recommender system
best fuse both of membership and friendship, by means of
                                                                    attempting to use more social information for recommen-
comparing their respective effects and potential combinative
                                                                    dation generation. The social data they introduced are the
impacts via different fusion platforms. We believe that our
                                                                    friendship of users (from GeekBuddy), which was used to
study will shed light on the role and applicability of social
                                                                    refine the description of each user. [10] proposed a factor
information in boosting collaborative intelligence of current
                                                                    analysis approach based on probabilistic matrix factoriza-
recommender systems. More specifically, our contributions
                                                                    tion to solve the data sparsity and poor prediction accuracy
can be summarized as follows:
                                                                    problems, by employing both of users’ social network in-
   • We demonstrate the exact value of fusing explicit social       formation and rating records. This work also concentrated
     relationships into recommender systems. Particularly,          on using friendship to improve recommendations. However,
     in some cases, the neighborhood of users learnt from           it has been shown that online friendship sometimes does
     membership has been found more accurate than from              not work well due to its inherent ambiguity as a relational
     purely embracing friendship with traditional CF.               descriptor [4, 16]. Compared to online friendship, online
                                                                    community membership contains more information about
   • We develop a framework to fuse and evaluate multiple           users’ preferences. [2] used membership for recommending
     types of social relationships in a systematical approach       online communities to members of the Orkut social network.
     through weighted-similarity calculation. Moreover, we          However, their recommendations were on a per-community,
                                                                                   Friendship Enhanced User Preference
rather than on a per-user basis. I. Guy et al. [22] [23] built                          User-Item         Friendship
a people recommendation system named SONAR by lever-
                                                                                        Similarity        Similarity
aging data from multiple channels including membership in
project wiki.                                                                                                          User


2.2   Graph-based Recommender Algorithm
   The computation of user/item similarity plays a key role                      Item                                            User
in user/item-based collaborative recommenders. Popular
measurements of user similarity are Cosine similarity and
Pearson’s correlation coefficient (see [9] for examples) based                                               Friendship Matrix
on the user-item rating matrix. The limitation is that they            User                     +
only use the local pairwise user information for neighbor-                                                         Group
hood searching.
   Recent years, graph-based methods have been introduced
to model relations between users and items from a global                                                                         User
perspective, and been used to seamlessly incorporate het-
erogeneous data sources into the traditional user-item rating
matrix. Huang proposed a two-level graph model for prod-                                                    Membership Matrix
ucts [18], in which the two layers of nodes represent products
and customers respectively, and three types of links between                            User-Item        Membership
nodes are: the product-product, the user-user, and the user-                            Similarity        Similarity
product link. The recommendation is generated based on                            Membership Enhanced User Preference
the association strengths between a customer and products.
   Random walks on graph have been extensively discussed         Figure 2: Fuse Friendship and Membership into Col-
[12] and shown a rather good performance in the recom-           laborative Recommender via Weighted-Similarity
mendation area. M. Gori and A. Pucci proposed a random-
walk based scoring algorithm, ItemRank [14], which can be
used to rank products according to expected user prefer-         mendation results. Our fusion framework aims at leverag-
ences, so as to recommend top ranked items to potentially        ing the two social relationships to strengthen the user sim-
interested users. Similarly, Baluja et al. [3] made video rec-   ilarity calculation process by two means: one is combining
ommendations for YouTube through random walk on the              the user-similarity from friendship and/or membership with
view graph, which is a bipartite graph containing users and      similarity from rating matrix in a weighted approach; the
videos where links are visiting logs of users on videos. F.      other is modeling the social relations and rating matrix all
Fouss et al. [13] presented a new perspective on charac-         in a graph, and then applying random walk on this graph to
terizing the similarity between elements of a database or,       compute the user similarity.
more generally, nodes of a weighted and undirected graph.
This similarity called L+ , the pseudoinverse of the Lapla-      3.1     Fusing via weighted-similarity
cian matrix of the graph. Their experimental results on the         In the rating matrix, we can view the preferences of users
MovieLens database showed that the Laplacian-based simi-         as feature vectors. Every user vector consists of n feature
larity computation performed well in comparison with other       slots, one for each available item. The values used to fill
methods.                                                         those slots can be either the rating rak that a user ua pro-
   However, the limitation of related work in the ”fusion of     vides to the corresponding item ik , or 0 if no such rating
social relationship” (the first subsection) is that few have     exists. Now, we can compute the proximity between two
considered the potential positive role of membership. More-      users ua and ub , by calculating the similarity between their
over, in the related work on ”graph-based recommender al-        vectors. For example, we can use the Cosine similarity for
gorithm”, no work on its performance as a fusion platform        this calculation as follows:
has been conducted. In this paper, we therefore aim at in-
                                                                                                            Rua · Rub
vestigating how to effectively fuse both of friendship and                    Similarity(ua , ub ) =                                    (1)
membership into CF algorithm via random walk approach                                                     ||Rua ||||Rub ||
in order to improve the performance and solve the sparsity       where Rua and Rub are two vectors of ratings from users ua
problem. To the best of our knowledge, our work is one of        and ub respectively.
the first attempts to use both friendship and membership to        According to the user similarity calculated from the rat-
enhance recommender systems.                                     ing matrix, we give a detailed description of fusing social
                                                                 relationships via weighted user similarity based on Fig 2.
                                                                   When fusing friendship with user similarity from rating
3.    FUSING SOCIAL RELATIONSHIPS INTO                           matrix, we need to get a user similarity based on the friend-
      RECOMMENDERS                                               ship firstly. We represent the friendship in the form of a
   In this section, we mainly take into account of two types     user-user matrix. If two users ui and uj are friends, then the
of explicit social relationships: friendship and membership,     value of cell uij is set to 1, otherwise 0. Based on this user-
and propose a generic framework to fuse them with the user-      user matrix, we calculate a friendship similarity by adopting
item rating matrix. Given that user-user similarity com-         Cosine Correlation, named Simf ri . Next, when calculating
putation is crucial to collaborative recommenders, a more        the final user similarity between ua and ub , we combine the
accurate user-user similarity always leads to better recom-      Simf ri with Simui (user similarity calculated from user-item
matrix) in a weighted approach as follows:                            been motivated to further use graph-based random walk al-
                                                                      gorithm for modeling these social data (i.e. friendship and
             Simui+f ri (ua , ub ) = λSimui (ua , ub )                membership). We think that the transitivity of the graph
                                                                (2)
                           +(1 − λ)Simf ri (ua , ub )                 will improve the computation of the similarity between two
                                                                      users.
The parameter λ is used to adjust the weight of Simf ri and
Simui , the bigger the λ is, the rating matrix plays a more
important role in the combined similarity. Finally, we use            3.2.1    Graph Construction for Social Community
this combined similarity Simui+f ri in finding neighbors for             The first key issue is to construct a meaningful graph so
each user.                                                            that the resulting similarity can truly reflects the prefer-
   When fusing the membership, firstly we also need to get            ence similarity between users. Considering a graph G =
a user-user similarity based on the membership data. Since            (V, E, W ), where V is the set of nodes (users, items or com-
membership is the relationship between the user and the               munities, etc.), E is the set of edges which represent rela-
communities/groups he/she joined, a new type of entity was            tionships between all types of nodes, and W is the set of
introduced besides the two entities (user and item). We               weights for all edges.
concretely represent the membership in the form of a user-               The relationship data in a social community can be in-
group matrix, where the rows indicate users and the columns           teractive relationship, such that user watched a movie or
indicate the groups joined by users. If a user ui joins group         listened to a song; or be social relationship like membership
gj , the value of cell uij is set to 1, otherwise 0. Based on this    or friendship. Let us use Last.fm which contains all types
user-group matrix, we can get a membership similarity by              of these data for example. It can be modeled by a graph
using Cosine Correlation too, named Simmem . As for the               G in the following way: there are three types of nodes in
generation of final user-user similarity, a weighted formula          Last.fm, the user, artist (item) and group, and each ele-
is applied where λ plays the same role as before.                     ment of the user, artist and group corresponds to a node of
                                                                      the graph; and the interactive relationship like a user lis-
            Simui+mem (ua , ub ) = λSimui (ua , ub )                  tens to an artist, social relationship like user is a member of
                                                                (3)
                          +(1 − λ)Simmem (ua , ub )                   a group, and user is a friend of another user is expressed as
                                                                      an edge. When a random walker walks on the graph, the dif-
   Furthermore, we are interested in seeing what will happen
                                                                      ference of node types were ignored, and we only care about
if two types of social relations are fused together with the
                                                                      how many short paths existed between two nodes which have
rating matrix. In this condition, we first calculate the user-
                                                                      directly impact on their similarity computation, so special
user similarity Simf ri from friendship and Simmem from
                                                                      treatment is not needed for any type of nodes in our graph.
membership independently, and then introduce two param-
                                                                         The weight wij of the edge connecting node i and node
eters: λ and β to adjust the weights of three data sources as
                                                                      j should have a meaningful value. Traditionally, we usually
shown in the equation 4.
                                                                      deal with interactive data in the following convention: the
      Simui+f ri+mem (ua , ub ) = λSimui (ua , ub ) + (1 − λ)         more frequent the interaction between node i and node j,
                                                                (4)   the larger the value of wij , and consequently, the easier the
            (βSimmem (ua , ub ) + (1 − β)Simf ri (ua , ub ))
                                                                      communication through the edge. However, in the case of
At the first level, λ is used to adjust the weight between rat-       social community, besides interactive data we also have so-
ing matrix and the other two social relationships; and then           cial relationship, and they usually are not associated with
β is used to adjust the remaining weight between friend-              a frequent number, e.g. most users join a group only once,
ship and membership. The bigger the λ is, the rating ma-              and so is the same for adding friends. For the consistency
trix plays a more important role; the bigger the β is, the            of the two types of edges and simplicity, we set the weight
membership plays a more dominant role in the combined                 of member of edge and friend of edge to be 1, and treat
user-user similarity.                                                 listens to edge as follows: if a user listened to the songs of
   After the computation of user-user similarity for finding          an artist more than 3 times, then the edge connecting the
neighbors, the next step is to recommend items to users by            user and the artist was weighted 1. We require the weights
predicting each item’s ratings. The predicted rating ri,m of          to be both positive wij > 0 and symmetric wij = wji , so
a test item m for the user i is hence computed as:                    the graph we built is an undirected graph.
                         PN                                              When handling friendship between users, there are several
                           j=1 sim(ui , uj ) ∗ rj,m                   approaches to transforming the pairwise similarity between
                ri,m =                                      (5)
                            PN                                        nodes of the same type (e.g. users) into a graph [17], such
                               j=1 sim(ui , uj )
                                                                      as the -neighborhood graph, k-nearest neighbor graph, and
where rj,m is the rating of user uj on the item im , and              the fully connected graph. We choose the -neighborhood
sim(ui , uj ) is the similarity between the current user ui and       graph to fuse the friendship on the graph, not only because
the neighbor uj . In fusing via weighted-similarity approach,         it has computational advantage by using a sparse represen-
sim(ui , uj ) can be similarity measures in equation 2, 3, and        tation of the data, but also because it can filter out the
4, depends on the fusing strategy and data used each time.            noisy data so that we can create a concrete friend set for
                                                                      each user. When constructing the -neighborhood graph,
3.2    Fusing via Graph                                               we connect user-node pairs whose friendship similarities are
  In the “fusing via weighted-similarity” method, we cal-             greater than . Given that the range of friendship similarity
culated the user-user similarity based on the static “local”          is [0,1], the range of  is also [0,1]. We enumerate  in this
pairwise user information. As we know, social network is              range with step 0.01, and finally we get the optimal .
inherently in a graph structure with the transitivity charac-            When the graph was built, for the corresponding sym-
teristics as a key feature of social relations, therefore we have     metric adjacency matrix A of graph G, the element aij was
defined as: aij = wij if node i is connected to node j and aij            4.      EXPERIMENTS
= 0 otherwise. Thus, people who listen to the same artists
and join same groups, will be connected by a comparatively                4.1      Data Sets
larger number of short paths.
                                                                             Traditional data sets used in the evaluation of collabo-
3.2.2     Random Walk and Similarity Measures                             rative filtering systems, such as MovieLens, do not include
                                                                          explicit social relationship, while in Last.fm, a popular so-
   Random walk is a mathematical formalization of a trajec-
                                                                          cial music site, community information is available so that
tory that consists of taking successive random steps. At each
                                                                          an entity-relation model can be generated which includes
step, the next node in the walk is selected randomly from
                                                                          the relationship between users and items.
the neighbors of the last node in the walk. The sequence
                                                                             For our purpose, we extracted two typical social relation-
of visited nodes is a Markov Chain [20], with the transition
                                                                          ships: the friendship between users and the membership
probability:
                                                                          which describes the user’s participation in groups, by ac-
                                                                          cessing its Web Service APIs 1 . Besides, we think it is more
                        (
                             1
                           d(i)
                                , if (i, j) ∈ E
                 pij =                                    (6)             meaningful to recommend artists instead of individual mu-
                           0 , otherwise
                                                                          sic since music is variable while preference on artists is more
where d(i) is the degree of node i.                                       constant, so we use artist as the ”item” in our recommenda-
   From the viewpoint of collaborative recommender sys-                   tions. A user and an item is linked if the user listened to
tems, finding accurate neighbor set for each user is the cor-             song(s) of the artist, and a user and a community is linked
nerstone, so a good similarity measure on the graph is a                  if the user joined the group. The relationship between an
crucial step.                                                             artist and a community is formed if the artist’s songs were
   Fouss et al. [13] has discussed several approaches to com-             frequently listened by users in this group. There are also
puting similarities between nodes of graph and their applica-             links among users describing their friendship.
tion to collaborative recommendations, that mainly included                  We concretely established an active data set consisting of
distance-based measure like ECTD, and inner-product based                 943 users, 1, 001 artists and 676 groups. There are 36, 424
measure like L+ .                                                         records in the user-artist matrix which sparsity degree (per-
   ECTD is the abbreviation for Euclidean Commute-Time                    centage of zero values in the matrix) is 96.14%, and 7, 038
Distance. Average commute-time is the average number of                   records in the user-group matrix which sparsity is 98.89%.
steps that a random walker, starting in node i (i 6= j), will             The total number of friendship of 943 selected users is 33776,
take to enter node j for the first time and go back to i,                 which means each user on average has 35.8 friends. Please
represented as n(i, j). Average commute-time is symmetric                 note that the rating matrix here is the user-artist matrix,
by definition, and it is a distance measure. n(i, j)1/2 is also           and if a user listened to song(s) of an artist, there is ”1” in
a distance in Euclidean space, and it is named as Euclidean               the corresponding cell.
Commute-Time Distance.                                                       By means of 5 fold cross-validation 2 , each row (represent
   L+ is the Moore-Penrose pseudoinverse of the Laplacian                 a user) of the user-artist matrix is randomly split into five
matrix L. The Laplacian matrix L of the graph is defined                  different sets. For each time of experiment, four-fifths of of
as, L =                                                                   the data is included in the training set and the other is used
       PnD - A, where D = Diag(ai. ), with dii = [D]ii =                  as the testing data.
ai. =     j=1 aij , and A is the adjacent matrix of graph G.
Let e be a column vector with 1(i.e., e = [1, 1, . . . , 1]T , where
T denotes the matrix transpose), then L+ can be computed                  4.2      Evaluation Metrics
with the formula:                                                            We adopted standard metrics in the area of information
                                                                          retrieval to evaluate our recommenders. During each round
                 L+ = (L − eeT /n)−1 + eeT /n,                     (7)    of cross-validation, we recommend and rank a set of poten-
where n is the number of nodes. If we define ei as the                    tial artists for each user. We then compare the predicted rec-
ith column of I, ei = [0, . . . , 0, 1, 0, . . . , 0]T (1 is in the ith   ommendation list with true preferences on artists in the test
column), then we can explain the relation between average                 set, and compute precision, recall, and F-measure scores.
commute-time and L+ in the form
                                                                               1. Recall. The score measures the average (on all users)
               n(i, j) = VG (ei − ej )T L+ (ei − ej ),             (8)            of the proportion (in percentages) of artists from the
where each node i is represented by a unit vector ei in the                       testing sets that appear among the top n ranked list
node space spanned by {ei }. It can be proved that the node                       from the training sets, for some given n. It should be as
vector ei can be transformed into a new Euclidean space, and                      high as possible for good performance. We computed
the elements of L+ (lij ) are the inner products between these                    the recall from Top-1 to Top-20 artists(for a total of
transformed node vectors. Therefore L+ is a kernel matrix(a                       1001 artists).
Gram matrix) and can be used as a similarity matrix for the
nodes.                                                                         2. Precision. This metric measures the proportion of
   According to Fouss’ study, the inner-product based sim-                        recommended items that are ground truth items. Note
ilarity measure L+ provides better and more stable results                        that the items in the profiles of the testing data rep-
for collaborative recommenders, so we adopt L+ metric to                          resent only a fraction of the items that the user truly
measure the similarity between users, which means in equa-                        accessed.
tion 5, sim(ui , uj ) equals to lij in this case.
                                                                          1
                                                                              The web page is http://www.last.fm/api
                                                                          2
                                                                              The number of fold is the number of tested sets.
  3. F-measure. F-measure is the weighted harmonic mean                                             Fusing friendship on User-Item Matrix
     of precision and recall. The equation is as follows:                                      15
                                                                                                               friendship fusion
                        2 ∗ precision ∗ recall                                                 14
                    F =                                     (9)
                        (precision + recall)
                                                                                               13


                                                                                F-Measure(%)
  In the following, we report the results of recommending
                                                                                               12
the top 1, 2, 5, 10 and 20 artists. At each pass, 50 users are
taken as neighbors based on different similarity measures for                                  11
recommendation.
                                                                                               10
4.3    Experiment Design and Results
  At first, we evaluated the fusion approach via weighted-                                      9
similarity. A user-based collaborative filtering recommender
                                                                                                8
was first run on the user-artist rating matrix, resulting in the                                    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
baseline represented by CFU I (see Tables 1 to 3 respectively                                                      Lambda
showing precision, recall and F-measure scores).

                                                                   Figure 3: F-measure changes as lambda change in
Table 1: Precision of Fusion via Weighted-Similarity               friendship fusion
Approaches
 Precision           Top 1    Top 2    Top 5    Top 10    Top 20
 CFU I               29.93    25.12    19.09    15.23     11.33                                      Fusing member on User-Item Matrix
 W Sf ri+U I         29.67    25.33    19.64    15.51     11.48                                15
                                                                                                            membership fusion
 W Smem+U I          29.44    25.49    19.58    15.33     11.40
                                                                                               14
 W Sf ri+mem+U I     29.37    25.45    19.58    15.34     11.42
                                                                                               13


                                                                                F-Measure(%)
                                                                                               12
Table 2: Recall of Fusion via Weighted-Similarity
                                                                                               11
Approaches
 Recall              Top 1    Top 2    Top 5    Top 10    Top 20
                                                                                               10
 CFU I               3.87     6.50     12.36    19.72     29.34
 W Sf ri+U I         3.84     6.56     12.71    20.07     29.73                                 9
 W Smem+U I          3.81     6.60     12.67    19.84     29.52
 W Sf ri+mem+U I     3.80     6.59     12.68    19.86     29.58                                 8
                                                                                                    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
                                                                                                                   Lambda


Table 3:    F-measure of Fusion via Weighted-                      Figure 4: F-measure changes as lambda change in
Similarity Approaches                                              membership fusion
 F-measure           Top 1    Top 2    Top 5    Top 10    Top 20
 CFU I               6.85     10.33    15.01    17.19     16.35
 W Sf ri+U I         6.80     10.42    15.43    17.50     16.56
                                                                      Next, we fused membership with rating matrix and also
 W Smem+U I          6.75     10.49    15.38    17.29     16.45    used the parameter λ to adjust the weight of rating matrix
 W Sf ri+mem+U I     6.73     10.47    15.39    17.31     16.48    and membership in the similarity calculation.
                                                                      From the figure 4, it can be seen that when λ = 0.8, we
  Then, we tried to fuse friendship with the rating matrix         can get best results overall. The third rows of Tables 1 to
and used λ to adjust the weight of rating matrix and friend-       3, represented by W Smem+U I , respectively show precision,
ship while computing user similarity. Figure 3 shows how           recall and F-measure values when λ = 0.8. It shows that
F-measure changes with the changing of λ. It achieves the          the results slightly improve on the baseline, but are a little
peak when λ = 0.7, which means that the rating matrix              weaker than the fusion of friendship on the rating matrix.
contributes to 70% percent of the weight while friendship             We further fused two relationships together on the rat-
contributes to 30% in calculating the user-user similarity.        ing matrix and adopted two parameters: λ and β to adjust
Specifically, results in the second rows of Tables 1 to 3,         the weights for the three data sources as shown in Formula
represented by W Sf ri+U I , give the precision, recall and F-     4. Experimental results indicate that the hybrid fusion per-
measure scores when λ = 0.7. It can be seen that it is             forms best when lambda = 0.8 and beta = 0.5 (which means
better than the baseline, especially when returning top 5          rating matrix contributes 80%, friendship and membership
recommendations (the improvement of F-measure achieved             respectively contributes 10% and 20% in the user-user sim-
up to 2.79%). In the process of tuning λ and β in the fol-         ilarity calculation). These results are illustrated in the last
lowing, we considered the average score of Top-1 to Top-20         rows of Tables 1 to 3. To our surprise, compared to fusion of
recommendations.                                                   friendship and membership separately, it did not have dis-
                                                           Weighted-Similarity Fusion: Rating + Friendship
                                           12            Weighted-Similarity Fusion: Rating + Membership              Table 5: Table of improvements on F-Measure by
     Percentage of F-measure improvement        Weighted-Similarity Fusion: Rating+Friendship+Membership
                                                          Graph Fusion: Rating + Friendship + Membership
                                                                                                                      Comparing with the Baseline
                                           10
                                                                                                                       Improvements         Top 1   Top 2    Top 5    Top 10    Top 20
                                            8                                                                          W Sf ri+U I          -0.73   0.87     2.80     1.80      1.28
                                            6
                                                                                                                       W Smem+U I           -1.46   1.55     2.47     0.58      0.61
                                                                                                                       W Sf ri+mem+U I      -1.75   1.36     2.53     0.70      0.80
                                            4                                                                          Gf ri+mem+U I        4.82    7.94     7.99     4.25      4.04
                                            2

                                            0                                                                         menting recommendations, and particularly that the graph-
                                           -2                                                                         based fusion is more effective in bringing into play of the
                                                         1           2           5        10           20             power of social data. To the best of our knowledge, the
                                                                  Size of Top-N recommendations
                                                                                                                      work is one of the first attempts to explore the effect of
                                                                                                                      membership in addition to friendship, and to fuse both of
Figure 5: Improvements on F-Measure of All the                                                                        them based on random walk graph model with collaborative
Fusion Approaches                                                                                                     filtering (CF) systems.
                                                                                                                         For the next step, we are interested in further exploring
                                                                                                                      the impact of social relationships on recommender systems
tinct difference and was actually even a little weaker than
                                                                                                                      from three aspects: one is to explore other potential re-
the pure fusing of friendship.
                                                                                                                      lationships, such as the relation between items and asso-
   In order to identify whether the fusion via graph model
                                                                                                                      ciated groups, other social relationships besides friendship
would perform better than via weighted similarity approach
                                                                                                                      and membership, such as the reporting chain in a company,
given that social data are inherently in the graph structure,
                                                                                                                      to see how to model and utilize these data in order to make
we finally did the experiment of fusing the two social rela-
                                                                                                                      better recommendations; another direction is to explore how
tionships on the graph and applied random walk algorithm
                                                                                                                      to enhance Random Walk model so as to handle heteroge-
to calculate neighborhood similarity.
                                                                                                                      neous data in a more fine-grained way, based on the method
   Based on the graph construction method described before,
                                                                                                                      proposed in [25]; finally, as the explosion of the size of social
we firstly run a series of simulations to learn the optimal .
                                                                                                                      websites, we need to pay more attention to the algorithm’s
After running 100 times, the optimal  that we got is 0.05.
                                                                                                                      scalability and efficiency, when the social graph grows with
The comparative results under threshold 0.05 are then com-
                                                                                                                      millions of nodes.
puted and listed in Table 5. It shows that precision, recall
and F-measure scores are all highly improved compared to
the fusion via weighted-similarity approach. In particular,                                                           6.   ACKNOWLEDGMENTS
when returning top 2 and top 5 recommendations, the im-                                                                 We thank Lawrence Bergman of IBM T.J. Watson Re-
provements significantly reached 7.94% and 7.99% respec-                                                              search Center for his generous help and valuable comments
tively.                                                                                                               on this work.


Table 4:                                         Fusing friendship and membership via
                                                                                                                      7.   REFERENCES
Graph                                                                                                                  [1] C. Lam. Snack: incorporating social network
 Gf ri+mem+U I                                               Top 1      Top 2         Top 5         Top 10   Top 20        information in automated collaborative filtering. In
 Precision                                                   31.30      27.12         20.62         15.88    11.79         EC ’04: Proceedings of the 5th ACM conference on
 Recall                                                      4.05       7.02          13.35         20.55    30.54         Electronic commerce, pages 254–255, New York, NY,
 F-Measure                                                   7.18       11.15         16.21         17.92    17.01         USA, 2004. ACM.
                                                                                                                       [2] E. Spertus, M. Sahami, and O. Buyukkokten.
                                                                                                                           Evaluating similarity measures: a large-scale study in
  Figure 5 further shows that while returning Top-1 recom-
                                                                                                                           the orkut social network. In KDD ’05: Proceedings of
mendation, the fusion via graph can achieve an improvement
                                                                                                                           the eleventh ACM SIGKDD international conference
at 4.82%, and others are however all slightly weaker than
                                                                                                                           on Knowledge discovery in data mining, pages
the baseline. When returning top N recommendations from
                                                                                                                           678–684, New York, NY, USA, 2005. ACM.
2 to 10, all of the fusion approaches enhance the baseline
                                                                                                                       [3] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik,
CF method, which positively proves the usefulness of social
                                                                                                                           S. Kumar, D. Ravichandran, and M. Aly. Video
relationship data especially when multiple recommendations
                                                                                                                           suggestion and discovery for youtube: taking random
are computed. It also indicates that the fusion via graph can
                                                                                                                           walks through the view graph. In J. Huai, R. Chen,
boost the baseline significantly by up to 8%, demonstrating
                                                                                                                           H.-W. Hon, Y. Liu, W.-Y. Ma, A. Tomkins, and
that the graph model is a more proper way to fuse explicit
                                                                                                                           X. Zhang, editors, WWW, pages 895–904. ACM, 2008.
social relationships.
                                                                                                                       [4] D. Boyd. Friends, friendsters, and myspace top 8:
                                                                                                                           Writing community into being on social network sites.
5.   CONCLUSION AND FUTURE WORK                                                                                            http://www.firstmonday.org/issues/issue11 12/boyd/
  In this paper, we presented two principal methods to inte-                                                               index.html, December 2006.
grate explicit social relationships into traditional CF meth-                                                          [5] N. Baym. How Good A Friend Is A Last.fm Friend?
ods: the weighted-similarity fusion and the graph fusion. We                                                               http://www.last.fm/user/popgurl/journal/2008/04/28/
demonstrated the effectiveness of social relationships in aug-                                                             3d9l how good a friend is a last.fm friend.
 [6] M. Balabanović and Y. Shoham. Fab: content-based,         [21] I. Konstas, V. Stathopoulos, and J. M. Jose. On social
     collaborative recommendation. Commun. ACM,                      networks and collaborative recommendation. In SIGIR
     40(3):66–72, 1997.                                              ’09: Proceedings of the 32nd international ACM
 [7] P. Melville, R. J. Mooney, and R. Nagarajan.                    SIGIR conference on Research and development in
     Content-boosted collaborative filtering for improved            information retrieval, pages 195–202, New York, NY,
     recommendations. In Eighteenth national conference              USA, 2009. ACM.
     on Artificial intelligence, pages 187–192, Menlo Park,     [22] I. Guy, I. Ronen, and E. Wilcox. Do you know?:
     CA, USA, 2002. American Association for Artificial              recommending people to invite into your social
     Intelligence.                                                   network. In IUI ’09: Proceedings of the 13th
 [8] M. J. Pazzani. A framework for collaborative,                   international conference on Intelligent user interfaces,
     content-based and demographic filtering. Artif. Intell.         pages 77–86, New York, NY, USA, 2009. ACM.
     Rev., 13(5-6):393–408, 1999.                               [23] J. Chen, W. Geyer, C. Dugan, M. Muller, and I. Guy.
 [9] J. S. Breese, D. Heckerman, and C. M. Kadie.                    Make new friends, but keep the old: recommending
     Empirical analysis of predictive algorithms for                 people on social networking sites. In CHI ’09:
     collaborative filtering. In G. F. Cooper and S. Moral,          Proceedings of the 27th international conference on
     editors, Proceedings of the 14th Conference on                  Human factors in computing systems, pages 201–210,
     Uncertainty in Artificial Intelligence, pages 43–52,            New York, NY, USA, 2009. ACM.
     1998.                                                      [24] J. Wang, A. P. de Vries, and M. J. T. Reinders. A
[10] H. Ma, H. Yang, M. R. Lyu, and I. King. Sorec: social           user-item relevance model for log-based collaborative
     recommendation using probabilistic matrix                       filtering. In M. Lalmas, A. MacFarlane, S. M. Rüger,
     factorization. In CIKM ’08: Proceeding of the 17th              A. Tombros, T. Tsikrika, and A. Yavlinsky, editors,
     ACM conference on Information and knowledge                     ECIR, volume 3936 of Lecture Notes in Computer
     management, pages 931–940, New York, NY, USA,                   Science, pages 37–48. Springer, 2006.
     2008. ACM.                                                 [25] J. Zhang, J. Tang, B. Liang, Z. Yang, S. Wang, J. Zuo,
[11] J. Golbeck. Generating predictive movie                         and J. Li. Recommendation over a heterogeneous
     recommendations from trust in social networks. pages            social network. Web-Age Information Management,
     93–104. 2006.                                                   International Conference on, 0:309–316, 2008.
[12] P. G. Doyle and J. L. Snell. Random walks and
     electric networks, 2000.
[13] F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens.
     Random-walk computation of similarities between
     nodes of a graph with application to collaborative
     recommendation. Knowledge and Data Engineering,
     IEEE Transactions on, 19(3):355–369, March 2007.
[14] M. Gori and A. Pucci. Itemrank: A random-walk
     based scoring algorithm for recommender engines. In
     M. M. Veloso, editor, IJCAI, pages 2766–2771, 2007.
[15] G. Groh and C. Ehmig. Recommendations in taste
     related domains: collaborative filtering vs. social
     filtering. In GROUP ’07: Proceedings of the 2007
     international ACM conference on Supporting group
     work, pages 127–136, New York, NY, USA, 2007.
     ACM.
[16] R. Gross, A. Acquisti, and J. H. Heinz. Information
     revelation and privacy in online social networks. In
     WPES ’05: Proceedings of the 2005 ACM workshop on
     Privacy in the electronic society, pages 71–80, New
     York, NY, USA, 2005. ACM Press.
[17] U. von Luxburg. A tutorial on spectral clustering.
     Statistics and Computing, 17(4):395–416, December
     2007.
[18] Z. Huang. Graph-based analysis for e-commerce
     recommendation. PhD thesis, Tucson, AZ, USA, 2005.
     Adviser-Hsinchun Chen and Adviser-Daniel D. Zeng.
[19] H. G. Hummel, B. van den Berg, A. J. Berlanga,
     H. Drachsler, J. Janssen, R. Nadolski, and R. Koper.
     Combining social-based and information-based
     approaches for personalised recommendation on
     sequencing learning activities. International Journal of
     Learning Technology, 3:152–168(17), 12 August 2007.
[20] J. G. Kemeny and J. L. Snell. Finite Markov Chains.
     Springer-Verlag, 1976.