Improving Recommendation Accuracy by Clustering
                     Social Networks with Trust

                                    Tom DuBois                                                      Jennifer Golbeck
                     Computer Science Department                                           Human-Computer Interaction Lab
                   University of Maryland, College Park                                   University of Maryland, College Park
                        College Park, MD 20741                                                 College Park, MD 20741
                            tdubois@cs.umd.edu                                                    jgolbeck@umd.edu
                                     John Kleint                                                  Aravind Srinivasan
                     Computer Science Department                                            Computer Science Department
                   University of Maryland, College Park                                   University of Maryland, College Park
                        College Park, MD 20741                                                 College Park, MD 20741
                                 jk@cs.umd.edu                                                     srin@cs.umd.edu

ABSTRACT                                                                                Existing work has used trust to make recommendations by
Social trust relationships between users in social networks                          treating it as a weight in collaborative filtering algorithms
speak to the similarity in opinions between the users, both                          [3, 9, 15]. This has been effective, but it is not the only way
in general and in important nuanced ways. They have been                             in which trust can be used. If users can be clustered into
used in the past to make recommendations on the web. New                             trusted groups, these may be useful for improving the qual-
trust metrics allow us to easily cluster users based on trust.                       ity of recommendations made with a variety of algorithms.
In this paper, we investigate the use of trust clusters as a new                     While clustering can be computationally difficult with so-
way of improving recommendations. Previous work on the                               cial networks, a new trust metric we have developed makes
use of clusters has shown the technique to be relatively un-                         it easy to apply clustering techniques to build these groups.
successful, but those clusters were based on similarity rather                       Since trust is related to similarity, we use correlation clus-
than trust. Our results show that when trust clusters are                            tering to identify groups of trusted people.
integrated into memory-based collaborative filtering algo-                              This leads to the question: since we know that trust can
rithms, they lead to statistically significant improvements                          be useful for making recommendations directly, and if we
in accuracy. In this paper we discuss our methods, experi-                           can effectively cluster users by trust, can those clusters be
ments, results, and potential future applications of the tech-                       used to improve recommendation accuracy further? There
nique.                                                                               has been some work on using clusters for improving rec-
                                                                                     ommendations already. These techniques cluster based on
                                                                                     similarity, much as collaborative filtering techniques rely on
Categories and Subject Descriptors                                                   user similarity. Unfortunately, research has not found an
H.4 [Information Systems Applications]: Miscellaneous                                improvement in accuracy using these methods and, in many
                                                                                     cases, the recommendation techniques using clusters actu-
General Terms                                                                        ally lead to worse performance.
                                                                                        Previous experiments with trust have shown that it leads
Keywords                                                                             to improvements over similarity-based recommendation tech-
recommender systems, trust, social networks                                          niques in certain cases [8], and that user-assigned trust rat-
                                                                                     ings capture sophisticated types of similarity [11]. Thus, it
                                                                                     is possible that trust clusters may have benefits that are not
1.     INTRODUCTION                                                                  found when similarity-based clusters are used. Our results
   Trust between users in a social network indicates similar-                        show that incorporating trust clusters into collaborative fil-
ity in their opinions [25, 24, 11]. Hundreds of millions of                          tering algorithms - including algorithms that use Pearson
people are members of social networks online [10] and many                           correlation coefficients and algorithms that use trust - does
of those networks contain trust data. With access to this                            indeed lead to statistically significant improvemnets.
information, trust has potential to improve the way recom-                              In this paper, we will present a discussion of our new trust
mendations are made.                                                                 metric and its applications to clustering, the integration of
                                                                                     these clusters into recommendation algorithms, the experi-
                                                                                     ment where we tested these algorithms and found the im-
Permission to make digital or hard copies of all or part of this work for            provement in accuracy, and finally discuss a range of appli-
personal or classroom use is granted without fee provided that copies are            cations where this technique may be beneficial.
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific   2. RELATED WORK
permission and/or a fee.
RecSys ’09 New York, NY USA                                                             User clustering has been applied to the task of collabora-
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.                                      tive filtering before, but our work is the first that uses trust
as a basis for forming clusters. In this section, we describe    could perform significantly better than those based on sim-
work that has been done on using clustering for collabora-       ilarity alone.
tive filtering and on using trust for collaborative filtering.      In the FilmTrust recommender system mentioned above,
Then, we introduce the dataset used in our computations          trust is used in place of the Pearson correlation coefficient to
and experiments.                                                 generate predictive ratings. Results showed that when the
                                                                 user’s rating of a movie is different than the average rating,
2.1 Clustering and Collaborative Filtering                       it is likely that the recommended rating will more closely
   Breese et al. [5] used a Bayesian clustering model to clus-   reflect the user’s tastes. As the magnitude of this difference
ter users based on their ratings. Their work showed mixed        increases, the benefit offered by the trust-based recommen-
results; in some cases the clustering approach was competi-      dation also increases. Moleskiing [3], at http://moleskiing.it,
tive in terms of accuracy of the ratings and in others it per-   is another real system built to utilize trust ratings in a rec-
formed poorly. Ungar and Foster [23] also used a Bayesian        ommender system. Using a similar approach, it recommends
approach to cluster users based on their preferences. Their      routes to users based on information supplied by trusted
results also showed that clustering users was not a particu-     peers.
larly successful approach. Graph theoretic methods for clus-
tering users based on preferences were discussed in [19], how-   2.3 Dataset
ever they do not evaluate the impact these clusters have on        For these experiments, we needed our dataset to have two
recommendation accuracy or quality. Finally, in [22] users       components:
were clustered using a scalable neighborhood algorithm and,
once again, the clustering approach had a higher MAE than          1. A social network with trust ratings between individu-
the standard collaborative filtering method.                          als so we could apply the trust inference algorithm and
                                                                      clustering methods discussed in section 3.
2.2 Trust and Collaborative Filtering
   Social networks, and trust in particular, have been used        2. Ratings of items by the members of that social net-
to generate recommendations for users. In these cases, trust          work.
is used directly to generate the recommendation. This work         We used the FilmTrust dataset [9] for these experiments
follows from the fact that people tend to develop connec-        because it had both these features - a trust network and
tions with people who have similar preferences [1]. Trusting     a set of ratings on movies. Because trust assigned by one
the opinion of another particularly speaks to this type of       user to another is a value that is kept very private, there are
similarity. The applicability of this effect to recommender      no other publicly available datasets with this information.
systems has been established in several papers. Ziegler and      Thus, while FilmTrust provides a good basis for this initial
Lausen [25] that showed a correlation between trust and user     analysis, further analysis with privately held trust networks
similarity in an empirical study of a real online community.     will likely lead to additional insights.
Using All Consuming 1 , an online community where users            At the time of analysis, the FilmTrust movie rating dataset
rate books. The authors showed that users were significantly     has 29,551 ratings by 1,254 unique users over 1,946 movies.
more similar to their trusted peers than to the population as    That is an average of 15.2 ratings per movie, though some
a whole. This work was extended in [24] which augmented          movies have hundreds of ratings. Trust values are assigned
the analysis of the All Consuming community and added            on a 1 to 10 scale and are asymmetric; Alice may trust Bob
an analysis. The second result in [24] used the FilmTrust        at level n, but Bob may have no trust or a different value of
system[9] (described below) where users have stated how          trust in return for Alice. The entire FilmTrust social net-
much they trust their friends in a social network and also       work has 712 nodes with 1,465 edges and an average of trust
rated movies. Within that community, results also showed         rating is 6.83. Many of these nodes are in small groups of
a strong correlation between trust and similarity in movie       two or three disconnected from the main component. For
ratings. Further work in [11] shows that trust captures sim-     our algorithms, we selected the the giant component and re-
ilarity in more nuanced ways, such as similarity on items        moved nodes with a degree of 1. This left 348 nodes with
with extreme ratings and large differences.                      1,059 edges in the FilmTrust network. Since our recommen-
   Empirical results show that using trust from social net-      dation technique required nodes to be in the social network,
works can improve recommendations. O’Donovan and Smyth           we used only the ratings from these 348 nodes. They had
[20] performed an analysis of how trust impacts the accu-        8,457 ratings on 1,558 movies.
racy of recommender systems. Using the MovieLens dataset
[18], they create trust-values by estimating how accurately
a person predicted the preferences of another. Those trust       3. A PROBABILISTIC TRUST INFERENCE
values were then used in connection with a traditional col-         ALGORITHM
laborative filtering algorithm [14], and an evaluation showed       As part of our previous work [6], we developed a proba-
significant improvement in the accuracy of the recommenda-       bilistic trust inference algorithm that leads nicely to cluster-
tions. Massa and Bhattacharjee [16] also conducted a study       ing applications. In this section we present an overview of
on the applicability of trust in recommender systems. Their      that work and discuss the clustering techniques used.
study relied on the user ratings of products and trust rat-
ings of other users from epinions 2 as their dataset. Using      3.1 The Trust Inference Algorithm
a trust propagation algorithm, similar to that described in
                                                                    Our work takes a trust network, which may be very sparse
section 3, they showed that trust based recommendations
                                                                 since most people will know only a small fraction of the net-
1
    http://allconsuming.net/                                     work, and generates inferred trust values between all pairs
2
    http://epinions.com                                          in the network. We then use these trust values as the basis
in a trust distance metric space, where the more trust be-
tween a pair, the closer they are in the space. One of the
major benefits of our approach, and the benefit we exploit
here, is the ability to group people into clusters within this
metric space. In this section we present our probabilistic
trust inference algorithm and describe the properties of its
output which make it easy to cluster.                                               Axioms of inferred Trust
  A very intuitive idea motivates this trust inference model.
Consider the following scenario:
   • Alice knows Bob and thinks he has a pa,b chance of             Local Pessimism    Since tu,v is a pessimistic estimate, in-
     being trustworthy.                                                                direct information can only increase
                                                                                       trust, thus Tu,v ≥ tu,v .
   • Bob knows Eve and thinks she has a pb,e chance of be-            Bottleneck       If all paths from u to v use (a, b), then
     ing trustworthy, and he tells this to Alice if he is trust-                       Tu,v ≤ ta,b , and in general the lower
     worthy. If Bob is not trustworthy, he may lie about                               ta,b is, the lower Tu,v should be.
     pb,e and give any value to Alice.                                  Identity       Individuals should completely trust
   • Alice reasons that Eve is trustworthy if Bob is trust-                            themselves: Tu,u = Tmax .
     worthy and gives her the correct value pb,e and Eve is         Complete Trust     If there exists a path (a0 , ai , . . . , an )
     trustworthy with respect to Bob.                                                  such that for all i from 1 to n :
                                                                                       tai−1 ,ai = Tmax , then Ta0 ,an = Tmax .
   • This combination happens with probability pa,b pb,e if          Monotonicity      For any u, v such that Tu,v < Tmax ,
     Bob’s trustworthiness and Eve’s trustworthiness are                               augmenting a graph with a new trust
     independent.                                                                      path from u to v, or increasing a
   Thus we infer that Alice’s trust in Eve should be pa,b pb,e .                       ta,b value along an existing trust path
More formally we view any path through the network as a                                should increase Tu,v .
Bayesian chain. Define XBob , XEve to be the respective ran-           No Trust        For any u, v with no path from u to v,
dom events that Bob and Eve are trustworthy from Alice’s                               Tu,v = 0.
perspective. This is explained in more detail in Figure 1.
   The same analysis can be used if trust is a proxy for sim-
                                                                   Table 1: These rules should apply to any pessimistic
ilarity. Specifically Alice and Bob’s mutual trust can be a
                                                                   system which derives inferred trust from direct trust
measure of how similar their tastes in movies are. If trust is
                                                                   information.
interpreted as probability of liking the same film, then Alice
will agree with Eve about a movie if (but not necessarily
only if) Alice and Bob agree on it and Bob and Eve agree
as well.
   Our model would not be interesting if it required simple
probabilities which can be computed exactly. Fortunately
we can quickly estimate trust between individuals in a more
complicated network, one with exponentially many, highly
correlated paths between pairs of nodes. In these examples,
the Bayesian chain view still applies. If there exists a path
from Alice to Eve in a random network constructed from
trust values, then that path is a chain of people from Alice
to Eve who each trust their successor, and Alice can trust
Eve. Therefore Alice trusts Eve with the probability that
there is a path from Alice to Eve in the random graph.
                                                                        a                                                         f
   We define tu,v to be the direct trust between u and v, and                  1                                         1

Tu,v to be our inferred trust value. The direct trust values            1
                                                                               2
                                                                                                   1
                                                                                                                         2
                                                                                                                                  1
may be arbitrary, however the inferred trust should obey the            2
                                                                               1
                                                                                       c
                                                                                                   2
                                                                                                                d
                                                                                                                         1
                                                                                                                                  2

axioms in Table 1.                                                             2                                         2
   The idea that trust networks can be treated as random                b                                                         e

graphs underlies this algorithm. For every pair (u, v), we
place an edge between them with some probability that              Figure 2: This is an example network with a critical
depends on tu,v . We then infer trust between two people           edge. No one from the set {a, b, c} can trust anyone
from the probability that they are connected in the result-        in {d, e, f } except through the mutual trust between
ing graphs. Formally we choose a mapping f from trust              c and d.
value to probabilities. We then create a random graph G
where each edge (u, v) exists with probability f (tu,v ). We
then use this graph to generate inferred trust values Tu,v
such that f (Tu,v ) equals the probability that there is a path
from u to v in the random graph. We give a small illustra-
tive example graph in Figure 2. This model is one of many
that satisfies our trust axioms.
                            P r[XEve ] = P r[XEve |XBob ] · P r[XBob ] + P r[XEve |XBob ] · P r[XBob ]
                                       ≥ P r[XEve |XBob ] · P r[XBob ] = P r[XBob ∧ XEve ].

Figure 1: The second term drops out because Alice has no information about Eve if Bob is not trustworthy.
Furthermore, if Eve and Bob are independent, this probability becomes P r[XBob ]P r[XEve ].

                   t/10                                           t/15                                         1


Figure 3: Here we show distance grids, partitionings for the FilmTrust dataset, and the color code for the
grids. The i, j pixel in each grid encodes the distance between the ith and jth people accoring to our metric.
The color code is given by the bottom rightmost image, with red being 0 distance and violet being distances
of 10 and above. The rightmost grid is not probabilistic, but instead shows the transitive closure of the edge
set.


3.2 Trust Based Clustering                                               include, k-centers which finds a set of points S of k points
                                                                         which minimizes the distance from any point to its closest
  Given that f (Tu,v ) is the probability of a path connecting           point in S, k-means which partitions the points into k sets
u and v. The function d(u, v) = log f (T1u,v ) defines a metric          in a way that minimizes the variance within each group, and
space on the nodes because it satisfies four conditions:                 correlation clustering which partitions the points in a way
                                                                         that minimizes the sum of distances within groups minus
   • d(u, v) ≥ 0                                                         the sum of distances across groups. Each of these cluster-
   • d(u, v) = d(v, u) (though this condition is not neces-              ing algorithms have good approximation algorithms when
     sary for asymmetric metrics).                                       applied to points in a symmetric metric space [12, 13, 4],
                                                                         and some even have good approximations in an asymmetric
   • d(u, u) = 0                                                         metric space [2].
                                                                            While any of these clusterings can be applied, we focus on
   • d(u, v) + d(v, w) ≥ d(u, w)                                         a variant of correlation clustering. Its goal - finding clus-
  Since we have a metric space on the nodes where the fur-               ters maximizing agreement within and minimizing agree-
ther apart two nodes are, the lower the probability of a path            ment between clusters - fits naturally with our application.
between them, we can make use of existing metric clustering              Also, Unlike most clustering algorithms, no k representing
algorithm to partition the nodes into groups. A clustering               the number of clusters is provided as input, since optimiz-
algorithm takes a set of points in a metric space and groups             ing agreement is independent of the number of clusters. In
them in a way that tries to optimize some criteria. Examples             our application, the trust value from one node to another
Figure 4: The size of the six largest clusters in for each iteration of the correlation clustering algorithm. The
purple/blue bars indicate when the algorithm was run with a maximum radius of 1, the red/orange bars a
maximum radius of 2, and the green bars a radius of 3. Six iterations are shown as the bars within each color
group.


can be treated as a measure of similarity, with high trust in-   of recommendations, we ran several experiments. In this
dicating agreement and low trust indicating disagreement.        section, we discuss the datasets, experimental design, and
Using the complete graph output from the trust inference         results that show clusters can indeed improve the accuracy
algorithm, we can perform a correlation clustering over the      of recommendations.
graph, grouping people together who have more trust for
one another.                                                     4.1 Recommendation Algorithms
   Figure 3 shows the results from applying our algorithm
to the FilmTrust network. The distance grid shows one               There are many methods for generating predictive recom-
large mutually trusting group, as well as several progres-       mendations. Our goal in this work was not to create the
sively smaller mutually trusting groups. The largest of the      next best recommendation algorithm but rather to demon-
groups is trusted by a large portion of the network. The         strate that using clusters based on trust has the potential to
second largest group is well trusted by this largest group.      improve the accuracy of recommendations.
Beyond that, the plot where f (t) = t/15 brings out the             We used several basic recommendation algorithms to test
most difference within the groups.                               our hypothesis. The first is a basic ACF algorithm com-
   Finding an optimal correlation clustering is NP-hard [7],     putes a weighted average of ratings using the Pearson Cor-
but there are efficient constant factor approximation algo-      realtion coefficient between the recommendee and the rater
rithms. We use a variant where while there are nodes left        as a weight. To compute a recommendation we required
to cluster, we choose one at random to be a ”‘center”’ and       pairs of nodes to have at least four movies in common so we
create a new cluster out of all nodes within a fixed radius of   could compute a meaningful correlation coefficient. We also
this center. Since this algorithm is randomized, the output      tested a trust-based recommender algorithm. There are a
can vary from one execution to another. Thus, in clustering      number of approaches to using trust for recommendations
our inferred trust network, we ran several iterations of the     [9, 21, 17, 16], and we used a simple variation on user-user
algorithm to produce a representative set of clusters to work    automated collaborative filtering (ACF), replacing the cor-
with.                                                            relation coefficient with the inferred trust value computed
   In order to obtain clusters for our recommender system,       using the method described above. Thus, people the recom-
we ran the correlation clustering algorithm on the network       mendee trusts more will receive more weight. This approach
output by running our trust inference algorithm over the         has been used before and shown to produce equivalent re-
FilmTrust data. We used a maximum radius of 1, 2, and 3,         sults to ACF overall, and improved results in certain cases
and ran six iterations of the algorithm for each. With this      [9].
dataset, the algorithm generates one very large cluster, two        Both algorithms were modified to give more weight to
or three medium sized clusters, and many small clusters.         ratings from nodes in the same cluster as the recommendee.
Figure 4 shows the size of the six largest clusters in each      Considering only ratings by nodes in the same cluster would
iteration for all three maximum radii.                           exclude so much information that recommendations would
                                                                 suffer. However, if we believe that the clustered nodes are
                                                                 more valuable, we can give them more weight than would be
4.   EXPERIMENTS AND RESULTS                                     afforded using only the trust value. In these experiments,
                                                                 gave an additional 5% weight to nodes in the same cluster.
   To test whether or not using trust-based clusters drawn       All ratings for a movie were considered and weighted by the
from social networks could be used to improve the quality        inferred trust from the recomendee to the rater. Ratings
Figure 5: This illustrates the improvement in accuracy. The y-axis indicates Mean Absolute Error (MAE)
so lower values are better. Blue bars indicate the results from the cluster-enhanced algorithms and red bars
are the control. The numbers on the x-axis indicate the maximum radius of the clusters.


Table 2: Experimental Results of Cluster-Enhanced Recommendations. The cluster-enhanced method signif-
icantly outperforms the control in all cases.

          Method        Cluster       MAE           MAE          p-value      RMSE         RMSE          p-value
                        Radius        (method)      (control)                 (method)     (control)
          ACF           1             0.53454       0.53460      0.0031       0.70199      0.70204       0.0273
          ACF           2             0.53452       0.53460      0.0012       0.70196      0.70204       0.0022
          ACF           3             0.53453       0.53460      0.0069       0.70194      0.70204       0.0004
          Trust         1             0.63495       0.63501      0.0018       0.82614      0.82620       0.0076
          Trust         2             0.63496       0.63501      0.0274       0.82615      0.82620       0.0493
          Trust         3             0.63496       0.63501      0.0342       0.82614      0.82620       0.0356


by nodes in the same cluster as the recommendee had their         We used only clusters with five or more nodes. To run the
weight multiplied by 1.05. This approach was used with the        experiments using trust-based recommendations, it was nec-
Pearson corrleation-based ACF method and the trust-based          essary that we had an inferred trust value between the con-
recommendation.                                                   sidered nodes in the network. Thus, nodes that were outside
                                                                  the giant component and thus had no inferred trust values
4.2 Experimental Setup                                            were excluded. For consistency, we used the same set of
  To test our hypothesis that using the trust clusters im-        nodes in all of our experiments.
proves the accuracy of recommendations, we ran the follow-
ing process.
                                                                  4.3 Results
  1. Select an iteration of the clustering algorithm to obtain       Our results showed that both cluster-enhanced recommen-
     clusters                                                     dations (giving 5% extra weight to the ratings from people
                                                                  in the same cluster as the recommendee) offered a small
  2. For each user-movie pair, generate a predictive rating       but statistically significantly improvement in accuracy over
     for the movie in two ways:                                   the algorithms that did not consider the clusters. All sig-
      (a) Using the standard trust-based recommendation           nificance results were computed using a Student’s t-test for
          algorithm                                               paired samples and were significant for p < 0.05.
                                                                     Since the correlation clustering algorithm is randomized,
      (b) Using a modified trust-based recommendation al-         we ran six iterations of the algorithm to obtain a represen-
          gorithm that gives more weight to the nodes in          tative sample. We ran the experiment on each iteration and
          the same cluster as the user as described above         then took the average rating for each user-movie pair over
  3. Compare the MAE and RMSE for the two recommen-               the six iterations to compare to the known value and judge
     dation methods                                               the impact of the cluster-enhanced approach. This ensured
                                                                  that we could see the true impact of the approach and not
  This was repeated for each iteration and configuration of       be misled by an unusually good or bad clustering.
the clustering algorithm and for all of the cluster-enhanced         Table 4 shows the results of our different cluster-enhanced
algorithms described above. The clustering algorithms pro-        algorithms on the dataset. For both the mean absolute er-
duced several large clusters and many very small clusters.        ror (MAE) and root mean squared error (RMSE), the algo-
rithm that took advantage of the clusters significantly out-      that has been shown to be useful for making recommenda-
performed the control, which ignored clusters.                    tions. In this paper, we looked at taking trust a step further.
                                                                  We clustered users based on the trust between them using
4.3.1 Coverage                                                    correlation clustering and then modified a collaborative fil-
  Clusters will not affect recommendations in all cases; it is    tering algorithm to use these clusters.
possible that no one in the same cluster as the recommendee          To test our approach we used a traditional Pearson cor-
has rated the movie in question. In those cases, the result       relation collaborative filtering algorithm and a recommen-
will be the same as the control method. However, analysis         dation algorithm that used trust for generating recommen-
shows that at least one person who rated the movie is in          dations independently of the clusters. In both, we modified
the cluster approximately 70% of the time, and thus the           the algorithms to give extra weight to ratings from nodes
clustering technique will have an impact.                         in the same cluster as the user for whom the rating was
                                                                  being generated. We compared the accuracy of these rec-
5.   DISCUSSION                                                   ommendations to those made by the unmodified version of
   These results show a small but statistically significant im-   the algorithm. In both cases, our results show a small but
provement in accuracy when correlation clusters generated         statistically significant improvement in the accuracy of rec-
from a trust network are incorporated into a recommender          ommendations when clusters are used.
system. While the magnitude of the improvement is small,             This improvement is particularly interesting since previ-
these results are promising when we consider that similarity-     ous work on clustering, which was based on user similarity,
based clustering approaches typically perform significantly       failed to outperform non-clustered methods and often per-
worse that their non-clustered counterparts.                      formed significantly worse. It suggests that trust captures
   Furthermore, we believe that the results of this approach      more sophisticated information about the similarity between
will become more practically significant on larger social net-    two people and that it is particularly useful for highlighting
works. Our method requires a social network with trust            more relevant information in recommendation environments.
values and item ratings created by the people in the net-         We believe this effect will be magnified in bigger networks
work. We ran these experiments on a network with 348              and that it has applications to limiting gaming and other
nodes, which is small relative to most web-based social net-      attacks in online rating systems.
works3 . We used this network because it is the only one
we had access to with the necessary data; since trust values      6. ACKNOWLEDGMENTS
must be kept private to be effective, other datasets are not       Supported in part by NSF ITR Award CNS-0426683 and
made public by the large social networks that have them.          NSF Award CNS-0626636, and DARPA.
This lack of public data does not limit the applicability of
the technique; it can be applied within privately held net-       7. REFERENCES
works where access to trust data is not a problem. The             [1] A. Abdul-Rahman and S. Hailes. Supporting trust in
needed data exists internally on many large networks, such             virtual communities. In Proceedings of the 33rd Hawaii
as Orkut. It also can be estimated from rating data on items           International Conference on System Sciences, 2000.
a pair of users have in common [11].                               [2] Aaron Archer. Two O(log ∗ k)-approximation
   With the larger social networks, we will see more large             algorithms for the asymmetric k-center problem. In
clusters. Since the experimental network has one large trusted         Proceedings of the 8th Conference on Integer
cluster, and several smaller ones, our results show a sig-             Programming and Combinatorial Optimization, pages
nificant improvement essentially from giving less weight to            1–14. Springer-Verlag, 2001.
nodes outside the cluster. We expect to see this effect mag-
                                                                   [3] Paolo Avesani, Paolo Massa, and Roberto Tiella.
nified when there are more large clusters.
                                                                       Moleskiing.it: a trust-aware recommender system for
   The fact that considering trust-clusters can improve rec-
                                                                       ski mountaineering. International Journal for
ommendations also suggests that is has potential to help
                                                                       Infonomics, 2005.
with other applications. By relying on connections in the
social network, it is possible to eliminate many types of          [4] Nikhil Bansal, Avrim Blum, and Shuchi Chawla.
attacks or gaming in rating systems that rely on creating              Correlation clustering. In Machine Learning, pages
multiple accounts. While these accounts could all connect              238–247, 2002.
to one another with high trust, they would only be clus-           [5] John S. Breese, David Heckerman, and Carl Kadie.
tered wiht “good” users if some of these good users assigned           Empirical analysis of predictive algorithms for
them high trust ratings as well. However, previous work has            collaborative filtering. pages 43–52, 1998.
shown that it is possible to eliminate these confused nodes        [6] Thomas DuBois, Jennifer Golbeck, and Aravind
from consideration [15]. These approaches together have the            Srinivasan. Rigorous probabilistic trust-inference with
potential to very effectively eliminate forged ratings and re-         applications to clustering. In IEEE / WIC / ACM
views at the same time as they highlight those most relevant           Conference on Web Intelligence, 2009.
to the user.                                                       [7] M.R. Garey, D.S. Johnson, et al. Computers and
                                                                       Intractability: A Guide to the Theory of
5.1 Conclusions                                                        NP-completeness. wh freeman San Francisco, 1979.
  Trust is strongly correlated with how similar two users are      [8] Jennifer Golbeck. Computing and Applying Trust in
in their preferences. It reflects similarity in nuanced ways           Web-based Social Networks. PhD thesis, University of
3
 Among      the   250     social networks      listed   at             Maryland, College Park, MD, April 2005.
http://trust.mindswap.org/ the mean size is over 4,600,000         [9] Jennifer Golbeck. Generating predictive movie
and the median is 22,000.                                              recommendations from trust in social networks. In
     Proceedings of the Fourth International Conference on         collaborative filtering. In AAAI Workshop on
     Trust Management, 2006.                                       Recommendation Systems, pages 112–125, 1998.
[10] Jennifer Golbeck. The dynamics of web-based social       [24] Cai-Nicolas Ziegler and Jennifer Golbeck.
     networks: Membership, relationships, and change.              Investigating Correlations of Trust and Interest
     First Monday, 12(11), 2007.                                   Similarity. Decision Support Services, 2006.
[11] Jennifer Golbeck. Trust and nuanced profile similarity   [25] Cai-Nicolas Ziegler and Georg Lausen. Analyzing
     in online social networks. ACM Transactions on the            correlation between trust and user similarity in online
     Web, in press.                                                communities. In Proceedings of the Second
[12] Dorit S. Hochbaum and David B. Shmoys. Best                   International Conference on Trust Management, 2004.
     possible heuristic for the k-center problem.
     Mathematics of Operations Research, (2):180–184,
     May 1985.
[13] Tapas Kanungo, David M. Mount, Nathan S.
     Netanyahu, Christine D. Piatko, Ruth Silverman, and
     Angela Y. Wu. A local search approximation
     algorithm for k-means clustering. In SCG ’02:
     Proceedings of the eighteenth annual symposium on
     Computational geometry, pages 10–18, New York, NY,
     USA, 2002. ACM.
[14] Joseph A. Konstan, Bradley N. Miller, David Maltz,
     Jonathan L. Herlocker, Lee R. Gordon, and John
     Riedl. Grouplens: applying collaborative filtering to
     usenet news. Commun. ACM, 40(3):77–87, 1997.
[15] Raph Levien and Alex Aiken. Attack-resistant trust
     metrics for public key certification. In 7th USENIX
     Security Symposium, pages 229–242, 1998.
[16] P. Massa and B. Bhattacharjee. Using trust in
     recommender systems: an experimental analysis. In
     Proc. of 2nd Int. Conference on Trust Management,
     2004., 2004.
[17] R. Matthew, R. Agrawal, and P. Domingos. Trust
     management for the semantic web. In Proceedings of
     the Second International Semantic Web Conference.,
     2003.
[18] Bradley N. Miller, Istvan Albert, Shyong K. Lam,
     Joseph A. Konstan, and John Riedl. Movielens
     unplugged: experiences with an occasionally
     connected recommender system. In IUI ’03:
     Proceedings of the 8th international conference on
     Intelligent user interfaces, pages 263–266, New York,
     NY, USA, 2003. ACM.
[19] B.J. Mirza, B.J. Keller, and N. Ramakrishnan.
     Studying recommendation algorithms by graph
     analysis. Journal of Intelligent Information Systems,
     20(2):131–160, 2003.
[20] John O’Donovan and Barry Smyth. Trust in
     recommender systems. In IUI ’05: Proceedings of the
     10th international conference on Intelligent user
     interfaces, pages 167–174, New York, NY, USA, 2005.
     ACM.
[21] John O’Donovan and Barry Smyth. Trust in
     recommender systems. In IUI ’05: Proceedings of the
     10th international conference on Intelligent user
     interfaces, pages 167–174, New York, NY, USA, 2005.
     ACM.
[22] B.M. Sarwar, G. Karypis, J. Konstan, and J. Riedl.
     Recommender systems for large-scale e-commerce:
     Scalable neighborhood formation using clustering. In
     Proceedings of the Fifth International Conference on
     Computer and Information Technology, pages
     158–167, 2002.
[23] L.H. Ungar and D.P. Foster. Clustering methods for