=Paper=
{{Paper
|id=Vol-532/paper-5
|storemode=property
|title=Social Trust as a Solution to Address Sparsity-inherent Problems of Recommender Systems
|pdfUrl=https://ceur-ws.org/Vol-532/paper5.pdf
|volume=Vol-532
}}
==Social Trust as a Solution to Address Sparsity-inherent Problems of Recommender Systems==
<pdf width="1500px">https://ceur-ws.org/Vol-532/paper5.pdf</pdf>
<pre>
     Social Trust as a solution to address sparsity-inherent
              problems of Recommender systems
                     Georgios Pitsilis                                                       Svein J. Knapskog
                      Q2S, NTNU                                                                 Q2S, NTNU
                O.S. Bragstads plass 2E                                                   O.S. Bragstads plass 2E
              NO-7491, Trondheim, Norway                                                NO-7491, Trondheim, Norway
                    +47 735 92743                                                             +47 735 94328
                  pitsilis@q2s.ntnu.no                                                   knapskog@q2s.ntnu.no

                                                                        sufficient quantities of data to the system for predictions to be
                                                                        computed accurately is described in the literature as “cold-start
ABSTRACT                                                                problem”[4].
Trust has been explored by many researchers in the past as a
successful solution for assisting recommender systems. Even             Our approach for overcoming the above problem is based on the
though the approach of using a web-of-trust scheme for assisting        idea of extending the neighboring base of new users so that they
the recommendation production is well adopted, issues like the          can be correlated with more participants, not necessarily linked
sparsity problem have not been explored adequately so far with          directly with each other via similarity relationships, but been
regard to this. In this work we are proposing and testing a scheme      discovered via “friends” as trustworthy for contributing useful
that uses the existing ratings of users to calculate the hypothetical   data. The trust for them can be inferred via their similar, and
trust that might exist between them. The purpose is to                  hence common, neighbors in a scheme that is known as ‘web-of-
demonstrate how some basic social networking when applied to            trust’. In this way, due to the propagation characteristics of trust,
an existing system can help in alleviating problems of traditional      it is plausible that similarity between entities that could not be
recommender system schemes. Interestingly, such schemes are             linked previously is becoming exploitable. Users who can be
also alleviating the cold start problem from which mainly new           discovered via friends-of-friends might be useful as they may
users are suffering. In order to show how good the system is in         carry valuable experience about some product that is of interest to
that respect, we measure the performance at various times as the        somebody else. As trust is mainly used for extending the number
system evolves and we also contrast the solution with existing          of relationships between people, users can now cooperate with
approaches. Finally, we present the results which justify that such     more participants than before and thus get access to more
schemes undoubtedly work better than a system that makes no use         recommendations. For short we call our system ‘hybrid’ as it
of trust at all.                                                        combines both trust networks and traditional recommender
                                                                        systems approaches. We used a framework called “Subjective
                                                                        Logic” for the reasoning of the virtual trust relationships, and
Keywords                                                                since the adaptation to existing recommender system models is a
Recommender Systems, Trust Modeling, data sparsity problem
                                                                        key issue, we used a model that we built our own for inferring
Cold-Start problem, Social network.
                                                                        trust from the existing users’ experiences.

1. INTRODUCTION                                                         The evaluation we present shows the twofold benefit of this
                                                                        approach as the accomplished reduction of sparsity is
Services offered by recommender systems tend to be hosted in
                                                                        accompanied by improved performance. To show the
centralized systems. Beside the benefit that is offered in terms of
                                                                        improvement with regard to the cold-start problem we
easiness in managing the resources and the availability of the
                                                                        demonstrate how some performance metrics evolve as the user
services, there are issues with regard to how much the new users
                                                                        community is growing. The rest of the paper is organized as
can receive the benefits of their participation in the system. As
                                                                        follows: Section 2 is referred to the description of the problem. In
new users we consider those who have not contributed enough
                                                                        section 3 we present the idea and the logical reasoning behind it.
data to the system and hence makes it difficult for predictions for
                                                                        An evaluation of the idea follows in section 4, analysis of work
them to be made. Similarly, the same problem seems to exist with
                                                                        that has been done in the area there is in section 5 and finally we
items which the users have not have much experience yet.
                                                                        conclude with a discussion of the results.
Recommender system services may be offered by social
networking platforms like FaceBook [1] and web-pages like
dealtime.com [2] or Amazon [3] and may be using mechanisms of           2. PROBLEM STATEMENT
User-Based recommender systems for working out predictions for          The sparsity-inherent problems of recommender systems are
items that users potentially like.                                      related to the fact that a satisfactory number of inferences for
                                                                        either users or items can not be extracted, due to lack of gathered
Any potential solution for alleviating the data sparsity issue
                                                                        information.
should not work at the expense of performance of such system but
instead it should provide some substantial benefits to users during     User-Based recommender systems that employ a technique called
their bootstrapping. The inability of new users in supplying            Collaborative filtering (CF) (in which the user preference for
some item is computed upon the similarities between the users),          explicitly because of the substantial adoption it has received for
mainly use Resnicks’s [10] formula (Equation. 1) for working out         studying the effects of trust propagation in user communities.
predictions ra,(i) for user a and items i. With wa,u is denoted the
Pearson’s similarity of user a with user u, and ru the rating of user
                                                                         3. DESCRIPTION OF THE APPROACH
u for the item that is of interest to a. The formula does not give
                                                                         User-based recommender systems that use Resnick’s formula are
satisfactory accuracy when sparse datasets are used, as the
                                                                         limited to computing predictions of ratings for users whose
predictions are highly sensitive to the number of similar
                                                                         similarity wa,u with the all contributing participants is known. The
participants, n.
                                                                         limited number of neighbors that can contribute during the system
                        n                                                bootstrapping is a significant constraint for achieving good
      ra ( i )  ra   w a , u ru ( i )  ru             (1)          performance during the early stages of the recommender system’s
                       u 1                                              life.
Even though the exploitation of social networks has been                 Since the accuracy of predictions that the querying user receives
recognized as a potential solution for addressing such problems          is dependent on the number of neighbors/predictors that appear to
[18], the applied solutions, at least as it seems, do not fulfill this   be similar to him/her, it means that a substantial improvement can
requirement completely. For example, in Epinions.com [17], a             be achieved if multiple predictors could be involved in the
well known commercial recommender system, the formation of               computation of ra,(i). As new users do not have enough
the trust network is done explicitly by asking users themselves to       experiences to contribute during the bootstrapping it means the
express whom they trust. In our opinion that is very unproductive        performance would be sub-optimal as there would not be enough
for two reasons: first, because not all users are familiar with the      links between them.
notion of trust and hence they are unable to explicitly express          One characteristic pitfall of a conventional recommendation
whom they trust, and second because people, as it happens for            system mechanism is the inability to incorporate prediction
item ratings, are unwilling to invest much time and effort in            ratings of other participants who have experienced some item that
contributing with their opinions. The latter is considered as the        is of interest to the querying agent, but their similarity with the
main reason for having sparse datasets.                                  querying agent cannot be inferred.
The cold-start problem has been approached in the past from the          If a recommender system were to be represented graphically, the
aspect of being a problem in social networking and recommender           similar users would appear to be within a distance of one hop
systems. In [20] it is suggested that special criteria should be used    away from the querying user. Exploitation of information that
for deciding to whom it is best for the new users to connect. The        resides at longer distances would be plausible if similarity could
decision in this case is based upon users’ so far objective              have propagative characteristics. Since trust is known for
assessment of candidates for their suitability and it is decided on      providing such property, which is the key idea of social networks,
how active they have been in contributing data to the system.            it means inferring trust from similarity could make it possible to
Nevertheless, such a decision is based solely on quantitative            overcome the above limitation. Pursuing this idea further, the
criteria and in our opinion it would be best if qualitative criteria     neighboring base of users could be extended beyond the one hop
(such as how useful such recommendations have been found to be           range by introducing a hybrid system in which the similarity
by the cold start users) were also used. In addition, in the above       could be inferred from trust.
mentioned solutions the social trust is not adequately exploited, as
                                                                         The above requirement for predicting the rating that some user i
the discovery for trustworthy participants is usually done by
                                                                         would give to item b can be expressed as follows:
applying local criteria.
                                                                             b  B |  a j  n ( a i ) : ri ( b )   r j ( b )  
It is crucial that when a new idea is applied to an existing system
                                                                                                                                         (2)
                                                                                                   a k  n ( a j ) : rk ( b )  
for enhancing its performance it is done in such a way that adapts
best to it. Therefore in our model, no additional data should be
required to be supplied by users, but the inferred trust should be       where B is the set of all items in the system, i,j,k are 3 users, a is
implicitly derived from the existing evidence instead. This              the set of users and n() is a function that denotes a similar
procedure is also useful from the practical aspect, since by doing       neighbor to i.
so the existing recommendation production cycle would not need
to change significantly to include the benefits that social              In a typical scenario of operation of a User-Based recommender
networking provides. Similar ways of implicit derivation of user’s       system (see figure 11) we assume that Alice and Bob have both
trust from evidence has been proposed for other purposes before,         experienced a number of items Ba and Bb respectively and hence it
like the EigenTrust algorithm [21] that was mainly build for peer-       can be known how similar they are. On the other hand Clark has
to-peer systems. However, the trust in that case was essentially         another set of common experiences with Bob, and hence it can
perceived as a global reputation value due to being independent          also be known how similar Bob is with him. However, Clark’s
on the point of view.                                                    experience about some new item that might be of interest to
                                                                         Alice, is not exploitable in the conventional system (Alice’s
To our knowledge, the use of trust networks for alleviating              similarity to Clark’s is not computable).
sparsity-inherent problems, such as the cold-start problem in
                                                                         Extending this consideration and inferring trust implicitly from
recommender systems have not been adequately studied so far.
                                                                         the calculated similarity for every pair of similar users, then
With regard to trust models and frameworks, there are many
developed so far, most of them mainly to meet special                    1
requirements of particular problems, and other more generic ones             The letters A…L represent the items rated by users and the
such as Subjective Logic [11]. We mention this framework                     numbers 1…5 the rates given.
finally a web-of-trust for social partners linked together can be             framework uses a simple intuitive representation of uncertain
developed.                                                                    probabilities by using a three dimensional metric that comprises
                                                                              belief (b), disbelief (d) and uncertainty (u) into opinions. It is
                  1   A
                                                                              required that evidence comes in such a form that opinions
                                                            1   M
                  4   B 5                    Bob                               p  { b pA , d pA , u pA } about some agent A with regard to the
                                                            1   B 2
                  2   C 3
                                                            5   H 4           proposition p can be derived from it, and thus be better
                  5   D 4
                  3   E 4
                                                            4   I 4           manageable due to the quite flexible calculus that the opinion
                                                            4   J 3           space provides. By convention the following rule holds for b,d,u:
                  4   F 4
                                              Functional    3   K 4
                      G 3                                                     b+d+u=1.
                          Recommender         trust for         L 1
     Alice                trust for Bob       Clark
                                                                              Subjective Logic provides the following two operators:
                                                                      Clark   recommendation (3) and consensus (4). Both can be used for
                      Indirect functional trust for Clark                     combining opinions and deriving recommendations regarding
                                                                              other agents in the social network.
       Figure 1. A typical recommendation production.
                                                                                        pA , B   pA   pB  {b pA , B , d pA , B , u pA , B }   (3)
One important issue that comes from the implicit derivation of
trust is concerned with the representation of the existing evidence           A, B are agents and  pB  {b pB , d pB , u pB } is the opinion of B
into trust metrics. For that reason some appropriate mapping is               about     p     expressed          as a recommendation to A.
necessary.                                                                     B  { b BA , d BA , u BA }      is the opinion of A about the
                                                                              recommendations of B. The consensus opinion  pAB is held by an
3.1 Trust Modeling
In socially aware systems, users benefit from their trust and                 imaginary agent AB representing both A and B.
connections with others as they can find people they need through
the people they trust. Links to a person imply some amount of                            p      p  {b pAB , d pAB , u pAB }              (4)
trust for this person. The importance of social networks is found
in the exploitation of such network data to produce information               The output values b,d,u of the combined opinions are derived
about trust between individuals which have no direct network                  from simple algebraic operations. More about this can be found in
connection. In theory about trust, this requirement is described              [11]. In our opinion the above considerations of Subjective Logic
with a property called transitivity which we are attempting to                are quite sufficient for deriving recommendations with regard to
exploit in this work. As trust is not perfectly transitive in the             the rating behavior of users whose subjective trustworthiness can
mathematical sense, however it can be useful in the way like in               be computed transitively within the social network of trusted
the real world people consider recommendations from others they               participants.
trust for their choices. Since trust that derives though                      In the literature trust is also distinguished into direct and indirect
recommendations is dependent on the point of view it means it is              trust, the former when it is derived from personal experience of a
merely subjective.                                                            trustor, and the latter when it is derived from recommendations of
In contrast to similarity, trust relationships can be propagated              others. Also, another distinction of it is functional trust, which
transitively throughout the network of users. In this way the                 expresses the trustworthiness for some agent with regard to some
common neighbors can act as trustworthy participants for users of             proposition p, and recommendation trust which expresses the
whom similarity is not known, but can be approximated via their               trustworthiness of some agent as a recommender.
derived trust. The concept of computing the indirect trust for                In our example depicted in figure 1, Bob has functional trust in
distant entities requires the employment of some suitable algebra             Clarke’s rating behavior, but the trust that can be derived by Alice
such as Subjective Logic. However, it is required that evidence               for Clark via Bob’s recommendation trust is indirect trust since
has first been transformed into some form that the trust algebra              Alice does not have her own evidence to support it, but merely
can use.                                                                      trusts Bob’s taste. Finally, the trust that Alice is interested in
Trust in subjective logic is expressed in a form that is called               knowing for Clark, is actually an indirect functional trust, which
opinion and is referred to a metric that originally was introduced            indicates how much she would trust him for his taste.
in Uncertain Probabilities theory [22], an extension to
                                                                              As far as the evidence transformation is concerned, various
probabilistic logic. An opinion expresses the belief about the truth
                                                                              models for converting ordinary observations into evidence have
of some proposition which may represent the behavior of some
                                                                              been proposed [11][19]. For our approach, we have used a simple
agent. The ownership of the opinion is also taken into account and
                                                                              model which is best suited to recommender system data [12]. In
this is what makes the assessment of trustworthiness subjective.
                                                                              our work we have come up with a solution of deriving the
Furthermore, this theory is suitable for modeling cases where
                                                                              trustworthiness that a pair of agents would place in each other by
there is incomplete knowledge. In the case of recommender
                                                                              using existing data such as ratings for items they have gathered
systems the lack of knowledge about some agent’s rating behavior
                                                                              experience with. In the same work an approach for mapping trust
comes from the fact that there are usually limited observations of
                                                                              into similarity is also introduced. This is explained below.
the rating behavior of some person. The lack of knowledge is
actually what shapes the subjective trust or distrust towards that            For calculating the uncertainty we used the simplified formula:
entity. The absence of both trust and distrust in opinions is                 u  ( n  1) 1 , in which n denotes the number of common
expressed by the uncertainty property. The subjective logic                   experiences in a trust relationship between two agents A and B.
The derivation of opinions from existing user experiences with                                      adjacent sparsity value. That is the percentage of ratings in the
items can be done by using an appropriate formula such as the                                       users by items matrix for which originally no values had been
one given below which we used for our experiment. This formula                                      provided.
was used for shaping the belief property (b) of Subjective Logic                                    To fulfill the requirement for calculations of similarity to be
from User Similarity (Wa,u) also known as Correlation                                               stable we assumed that similarity between two users is calculable
Coefficient.                                                                                        only if there are at least 10 items that have been rated by both of

                           b
                                   1
                                   2
                                               
                                     (1  u ) 1  Wa ,u
                                                        K
                                                                                      (5)
                                                                                                    them.
                                                                                                    It is worth mentioning that in the current experiment, no control
                                                                                                    has been applied on filtering the number of neighbors that a user
The disbelief property d of the opinions can easily be derived                                      can have, and the only limiting factor for forming a trust
from the remainder of b and u as: d =1-b-u.                                                         relationship is the number of common experiences that exist.

In our case scenario of recommender system the calculated belief                                    4.1 Metrics Used
(b) is referred to either recommender or functional trust. In                                       Predictive Accuracy metrics such as MAE are quite popular for
equation 5 the k value denotes the exponent in the equation used                                    measuring how close the recommender system’s predicted ratings
for transforming the similarity metric into derived belief (k=1 for                                 are to the true user ratings [13]. However, it is also interesting to
linear transformation).                                                                             know how good the system would be in successfully identifying
In the figure below we present pictorially a high level view of our                                 the items that users would be unhappy with, and therefore we also
hybrid system that could take advantage of this idea. The trust                                     used Classification Accuracy metrics. To justify this decision we
derivation mechanism for predicting the ratings that users would                                    claim that in the way the experiment was done, there were no
give to products can be easily embedded into the existing ordinary                                  rating predictions attempted for items that had no recorded rating
system.                                                                                             experiences and hence the danger that one might be lead into
                                                                                                    classification errors is significantly reduced.
               Similarity of Bob                                   Indirect Trust of
               for Clark                Direct Trust of            Alice for Clark
                                        Bob for Clark                                               4.1.1 Measuring Coverage
               Similarity of
               Alice for Bob            Direct Trust of                                             With Coverage we refer to the percentage of items for which
                                        Alice for Bob
                                                                 Similarity of Alice                predictions can be made and in [14] it is defined as:
                                                                                       Predicted
                                                                 for Clark

                                                                                                                           {b  B a j  n a i  : r j b  }
                                                                                       rating of
 Existing Ratings of                                                                   item L for
 Alice, Bob and                        Similarity of Alice                                                         ai  A
                                                                                                         C
                                                                                       Alice
 Clark                                 and Clark                                                                                                                     (6)
                                                                                                                                       BA
        Figure 2. The conceptual view of our hybrid system                                          where A and B are the set of users and products respectively, rj is
                                                                                                    a rating function and n(a) denotes the set of similar neighbors of
The part of the diagram shown in dotted line represents the                                         some user a.
existing recommendation production mechanism and it applies
only if evidence suffices for computing the similarity of Alice to                                  We introduce one new metric, User Coverage Gain (UCG), to
Clark.                                                                                              demonstrate the actual benefit that users receive when they make
                                                                                                    use of the trust graph. This metric relates the cost A with the
4. EVALUATION                                                                                       benefit R, expressed as a ratio of the hybrid system by the
We evaluated the performance of our hybrid approach against                                         standard CF. It can be computed using the following formula:
various alternatives such as the standard CF technique which                                                                R  R  1      
employs the Pearson similarity and uses Resnick’s prediction                                                     UCG   h   s   1                      (7)
formula [10].                                                                                                               Ah  As        
                                                                                                                                             TS
The results which demonstrate the effectivenes of predicting user
likeness express the ability of the system to identify potentially                                  Rh and Rs refer to the number of predictions that the system was
unsatisfactory options. Moreover, we introduce a set of metrics to                                  capable of performing for the new users in timestamp TS for the
demonstrate efficiency in terms of sparsity reduction as well as                                    hybrid and the standard recommender system respectively. Ah
effectiveness against the cold-start problem.
                                                                                                    and As refer to the sizes of populations of users which have
For the evaluation we used a subset of a movie recommendation
system called Movielens. This original dataset contains more than                                   made use of the trust network for discovering other participants at
1 million movie recommendations submitted during a period of                                        time TS, and correspondingly the adjacent number of users who
812 days by around 6000 users for 3100 movies. To capture the                                       would use the standard CF for performing those predictions. The
dynamic characteristics of performance that evolve over time, we                                    populations of users are expressed as in formulas (8) and (9). The
made use of the timestamp (TS) information that is attached on                                      formula for Ah is corresponding to the scenario that trust
every submitted rating. We divided the rating experiences into 5                                    propagation has been restricted to max distance of 2 hops only,
sets, each containing ratings submitted within the same period of                                   which is the case for our experiment. The items recommended in
time. Finally, we performed the experiment for each of those sets                                   every timestamp can be expressed as in formula (10)
separately. As the number of recommendations performed at each
                                                                                                       As  bB {a  A c  n(a ) : ra (b)   rc (b) }         (8)
stage is more important to be shown than the timestamp
information we considered as the best solution to present the
            Ah  bB {a  Ack n(a),d n(a)                         represents the ratio of instances that were correctly predicted as
                                                              (9)       non-satisfactory by the user against all instances that were
               d  n(ck ): ra (b)  rd (b) }                       predicted as non-satisfactory. Recall is: R  TP and
                                                                                                                                    TP  FN
                                                                        represents the number of instances that were correctly predicted
                 a A {b  B : ri (b) }
                     i
                                                               (10)     as non-satisfactory ones normalized by the total number of
                                                                        instances that actually received unsatisfactory rating. The F-score
For showing the level of contribution of the trusted participants to
                                                                        that shows the relative tradeoff between the benefits (TP) and the
a prediction to be made, we came up with a metric called Trust
                                                                        costs (FP) is calculated using the formula: F  2 PR .
Graph Contribution. This metric is presented in formula (11) and
                                                                                                                          PR
as can be seen it relates the relative increase in the number of        As rating values of 1 and 2 represent an unfortunate choice and 4
users (due to the use of trust network) used for the produced           and 5 a successful choice, we used the value 3 as threshold for
recommendations, with the actual number of recommendations              considering an experience as unsatisfactory. In table 1 we present
produced. This relative increase is expressed as the ratio of           the confusion matrix. We considered as true positives (TP) the
trusted neighbors by all neighbors (trusted and similar). With          instances that where correctly classified as receiving low rating
“trusted neighbors” we refer to those users in the social graph for     and false negatives (FN) those instances that were classified as
which their similarity with the querying user was derived by            having high rating, but still predicted as been non-satisfactory to
propagated trust recommendations and was not calculated directly        the users. True negatives (TN) denote those which were correctly
by correlating the common experiences (ratings) with some               classified as giving a high rating. Finally, false positives (FP) are
neighbor as it is done for the case of “similar” ones.                  referred to the number of bad items which were mistakenly
                        1       Ah                                 classified by the system as satisfactory ones for the new users.
                Ctrb   R                             (11)
                                 Ah  As  TS
                             iR 
                                                                                           Table 1. Confusion Matrix
                       
                                                                                           Predicted    Predicted       Predicted
In formula (11), As and Ah are the same as explained before, R is                Actual \               Value ≤ 2       Value > 2
the set of recommendations produced in some timestamp and is                         Rating ≤ 2             TP             FP
the sum of Rh and Rs. This metric demonstrates how effective the                      Rating > 2            FN             TN
discovery of neighbors of interest can be via the social network
and it is interesting to know how it contributes in the performance     The evaluation was done using the cross validation technique
improvement.                                                            leave-one-out applied on every user rating. The process was the
                                                                        following: every rating provided by each user was removed from
Beside new users, items for which there extensive experience is         the dataset and then its value was tentatively computed using the
not available are also affected by the cold-start problem.              trust network. The computed and the removed value are then
Therefore, we found it necessary to measure the improvements            compared and the error was calculated. The evaluation algorithm
that the hybrid solution would offer to them as well.                   is presented in the figure below.

4.1.2 Measuring Accuracy                                                  Algorithm: Evaluation plan
Predictive Accuracy is a standard metric for measuring how close
the predicted ratings are to the true user ratings. In our experiment     1. for all users i who have provided at least 10 ratings
we measure the MAE (Mean Average Error) which shows the                   2. for all items k of user i
absolute deviation between the two. However, as MAE can be                3. Pset ← ø
unimportant for showing the performance for items of interest to          4. for all users j whose common rated items with i > 10
users, we considered also using other accuracy metrics in addition        5.      if ( j similar to i )
to this.                                                                  6.         Pset ← Pset U { j }
                                                                          7.         Similar ← Similar +1
Classification Accuracy metrics are used to measure the
                                                                          8.      else if ( trust of i for j is computable)
frequency with which the system makes incorrect or correct
                                                                          9.         derive similarity of i for j from trust of i for j
decisions about whether an item is bad or good and it is usually
                                                                          10.         Pset ← Pset U{ j }
applied in connection with the task of finding lists of top items.
                                                                          11.         Trusted ← Trusted +1
As we mentioned previously we consider it more appropriate to             12. end for
demonstrate how useful the system is in helping users to avoid            13. predict k for over Pset
making choices of products that they might be unhappy with.               14. calculate MAE for k
Therefore we used the metric F-score, also called Harmonic                15. calculate TP,FP,NF,F-score for i
Mean, and it is used in information retrieval. F-score measures           16. end for
the effectiveness of retrieval with respect to the cost of retrieving     17. end for
the information [13].                                                     18. Trust Graph Contribution ← Trusted / (Trusted + Similar )
                                                                          19. average MAE for all i
It is necessary that the negative (N) and positive (P) instances are
clearly distinguished, and for our particular case we characterize
as N the case of experiences with products that the user would be
unsatisfied with and would give low rating. The opposite case                   Figure 3. The evaluation plan in pseudo-code
corresponds to P. Precision is defined as: P  TP and
                                                         TP  FP
With regard to coverage, only those values which were possible to                                               For the classification accuracy, the results show that there is quite
compute were considered for contributing to it.                                                                 a notable advantage of our method over the standard CF in
To study more closely the benefits that our system can offer to the                                             discovering those items that a user would be unhappy to choose.
entities that are mostly affected by the cold-start problem, we
repeated the experiments considering the new items and the new                                                                    Table 2. Accuracy expressed in F-score and MAE
users alone. To achieve that for every timestamp we filtered and                                                 Timestamp                                F-score                                MAE %
counted those entities which committed their first experience at                                                 (sparsity) \
                                                                                                                 Method                        Standard             Hybrid        Standard                  Hybrid
that particular timestamp.
                                                                                                                 1- (99,77 %)                  0,0836              0,0952          15.944                   15.686
4.2 Results – Discussion                                                                                         2- (98,57 %)                  0,1546              0,1832          14.562                   15.126
4.2.1 Overall Performance                                                                                        3- (97,50 %)                  0,2443              0,2720          14.652                   15.350
With regard to the overall performance of the system, we first                                                   4- (97,78 %)                  0,3030              0,3350          15.228                   15.676
demonstrate the results we obtained for the Coverage Gain and                                                    5- (95,71 %)                  0,3347              0,3598          15.210                   16.100
the Contribution of Trust Graph. In these diagrams, time is
represented by its adjacent sparsity value and is shown across the                                              As can be seen this advantage appears from early on (first
horizontal axis.                                                                                                timestamp), it maximizes at the second timestamp (highest
In figure 4 is shown how the Contribution of the Trust Graph                                                    difference between the F-score values of the standard and hybrid
develops over the experiment. For every recommendation                                                          models) when the data is still quite sparse, and continues so at all
produced, the ratio of trusted participants considered for this                                                 consecutive timestamps. It is interesting to note that this behavior
recommendation divided by all participants (trusted and similar)                                                is very unlikely to be coincidental as it appeared at all five
used for the same recommendation is counted and the results are                                                 different datasets we tested.
averaged. As can be seen from the diagram this metric follows in
general a decreasing trend throughout the simulation, but more                                                  One can also see that the predictive accuracy appears to be higher
importantly, it gets its maximum value towards the beginning                                                    in the proposed system than in the standard one, but here there is
when the trust network is still building up. We interpret this as a                                             temporary decrease during the early timestamps.
good indication that our system can cope well with the cold start                                                                              Predictive and Classification accuracy for all users
problem as the resources of the hybrid system are shown to be                                                              16,5                                                                                     0,6

exploited better at that timestamp. The decreasing trend followed                                                           16
afterwards is explained as the effect of gradual replacement of                                                                                                                                                     0,5
                                                                                                                           15,5
trust relationships by computed similarity as more and more
                                                                                                                            15
recommendations are submitted over time. In the same figure is
                                                                                                                                                                                                                    0,4


shown the benefit of using the hybrid system in terms of User                                                              14,5


                                                                                                                                                                                                                          F‐Score
                                                                                                                 MAE (%)


                                                                                                                                                                                                                    0,3
Coverage Gain in comparison to using the standard Collaborative                                                             14
Filtering. Interestingly enough, this metric follows an increasing                                                         13,5                                                                                     0,2

trend and more importantly it remains unaffected by the                                                                     13                                                                   CF (MAE)
decreasing rate of new users.                                                                                                                                                                    Hybrid (MAE)       0,1
                                                                                                                           12,5                                                                  CF (F-Score)
                                                                                                                                                                                                 Hybrid (F-Score)
                  Average Relative Benefit in performing recommendations when                                               12                                                                                      0
                                      using the Trust Graph
                                                                                                                                       99,77


                                                                                                                                                      98,93


                                                                                                                                                                        97,5


                                                                                                                                                                                         97,78


                                                                                                                                                                                                         95,71

            60                                                                                       45

                                                                                                     40
                                                                                                                                                              Timestamp - Sparsity (%)
            50


            40
                                                                                                     35                      Figure 5. The accuracy for the compared schemes
                                                                                                     30
            30
                                                                                                                4.2.2 Selective Performance
 Ctrb (%)


                                                                                                     25
                                                                                                          UCG


            20
                                                                                                     20         Next we present our results which demonstrate the behavior of the
            10
                                                                                                     15
                                                                                                                examined systems when considering only the new users and
             0
                                                                                                     10
                                                                                                                items. First we demonstrate the sizes of populations of cold-start
            -10                                                        User Coverage Gain
                                                                                                                users and items that appeared for the first time on each timestamp.
                                                                                                     5

            -20
                                                                       Contribution of Trust Graph
                                                                                                     0          From the diagrams in figure 6 it can be seen that in the case of the
                                                                                                                hybrid system, the cold-start users are shown to be committing
                     99,77


                                 98,57


                                                97,50


                                                               96,78


                                                                                        95,71


                                                                                                                their first experience with the hybrid system earlier than in the
                                         Timestamp - Sparsity (%)
                                                                                                                standard CF. In the first two timestamps, the number of new users
            Fig. 4. The Coverage Gain & Trust graph Contribution                                                in the former is higher than in the latter. As expected though, that
                                                                                                                trend is declining as the system develops over time. This looks
That is because the new users who join late get higher support as                                               quite reasonable from the way our experiment was done, as in
the friends-of-friends network is then denser than before.                                                      contrast to a real world running system, we used restricted size
As far as the rating accuracy is concerned the results for both the                                             sets of 100 users.
classification accuracy (F-score) and predictive accuracy (MAE)                                                 The early emergence of new users that appears in the hybrid
are shown in table 2 and are also graphically presented in figure                                               system is indicative of its success in exploiting the social network
5.                                                                                                              and performing predictions that wouldn’t be possible in the
conventional one, and hence attract more users. Consequently the
                                                                                                                                                                            Accuracy of predictions of new Items
same happens for cold-start items which are now discovered and                                                                                             18                                                                               0,35
rated by users earlier in the hybrid system than in the standard                                                                                           16
recommnder system.                                                                                                                                                                                                                          0,3
                                                                                                                                                           14
                                                                                                                                                                                                                                            0,25
                                                         New Items and Users                                                                               12
            800                                                                                                100
                                                                                                                                                                                                                                            0,2


                                                                                                                                                                                                                                                   F‐score
                                                                                                                                                MAE (%)
                                                                                             CF (Items)                                                    10
                                                                                                               90
            700                                                                              Hybrid (Items)
                                                                                             CF Users
                                                                                                                                                           8                                                                                0,15
                                                                                                               80
            600                                                                              Hybrid (Users)
                                                                                                               70                                          6
                                                                                                                                                                                                                                            0,1
            500                                                                                                                                                                                                          CF (MAE)
                                                                                                               60                                          4
                                                                                                                                                                                                                         Hybrid (MAE)


                                                                                                                          Users #
            400                                                                                                50                                                                                                        CF (F-score)       0,05
 Items #


                                                                                                                                                           2
                                                                                                                                                                                                                         Hybrid (F-score)
                                                                                                               40
            300                                                                                                                                            0                                                                                0
                                                                                                               30


                                                                                                                                                                    99,77


                                                                                                                                                                                98,57


                                                                                                                                                                                                             96,78


                                                                                                                                                                                                                                 95,71
                                                                                                                                                                                               97,5
            200
                                                                                                               20
            100
                                                                                                                                                                                          Timestamp - Sparsity (%)
                                                                                                               10

             0                                                                                                 0
                                                                                                                                                          Figure. 8. The benefit of hybrid system on new items
                      99,77


                                                98,57


                                                                  97,5


                                                                                   96,78


                                                                                                   95,71


                                                        Timestamp - Sparsity (%)
                                                                                                                                              The growing trend for F-score as it appears in fig. 7 is indicative
                               Figure. 6. The new users and items                                                                             of the increasing benefit that new users receive as the system
In table 3 we present the prediction and classification accuracy for                                                                          develops. In comparison to the classification performance for all
cold-start users and items and the results are pictorially presented                                                                          users presented in fig. 5, the new users receive higher benefit than
in figures 7 and 8. As far as prediction accuracy is concerned,                                                                               anyone else as they are potentially guided better to avoid products
from the results it can be seen that the deployment of the trust                                                                              they will be unhappy with.
system into the existing one has no impact on the accuracy of                                                                                 We consider the above two observations as a positive
ratings prediction, as the error is kept low (below 15%) during the                                                                           consequence for the proposed system for compensating the users
early stages of the system. For the new items the situation looks                                                                             early on. It is important that new users receive the highest benefit
quite a lot better as there is no noticeable penalty in the prediction                                                                        as they are assumed to be less tolerant in receiving poor
error against using the standard recommender system. In                                                                                       recommendations.       Experiencing      poor     recommendations
comparison to the overall performance results of figure 5, the new                                                                            consistently over time may reduce their trust towards the system
users receive higher benefit (MAE 14.87%) than the average user                                                                               and make them reluctant to rely on it for delivering good service.
(MAE 15.13%) during the early timestamps (at TS:2). However,                                                                                  If the original trust disappears, the users interest in using the
this benefit is diluted as the time progresses. Instead, new users                                                                            system may vanish altogether.
loose this advantage if using the standard system. (MAE:14.56%
                                                                                                                                                                Table 3. Accuracy for new Users and new Items.
opposed to 14.58%).
                                                                                                                                                                            Predictive accuracy in MAE
                    Predictive and Classification Accuracy for new Users
            20                                                                                                     0,4
                                                                                                                                                    Method                              New Users                        New Items
                             CF (MAE)                                                                                                               Timestamp
                             Hybrid (MAE)                                                                          0,35                             (sparsity)                Standard          Hybrid               Standard            Hybrid
            19
                             CF (F-score)
                             Hybrid (F-score)                                                                      0,3                               1 - (99,77 %)             15.62              15.8                13.00              13.52
            18

                                                                                                                   0,25
                                                                                                                                                          2 - (98,57 %)        14.58              14.87               15.06              15.27
            17
                                                                                                                                                                               14.75              15.04               15.32              15.71
                                                                                                                                    F‐score


                                                                                                                                                          3 - (97,50 %)
  MAE (%)


                                                                                                                   0,2
            16
                                                                                                                   0,15
                                                                                                                                                          4 - (97,78 %)        14.86              15.42               15.37              15.72
            15
                                                                                                                   0,1
                                                                                                                                                          5 - (95,71 %)        16.19              17.18               15.52              15.51
            14                                                                                                     0,05

                                                                                                                                                                      Classification accuracy in F-score
            13                                                                                                     0
                                                                                                                                                          1 - (99,77 %)     0.088       0.090     0.071                                  0.095
                     99,77


                                                98,57


                                                                  97,5


                                                                                     96,78


                                                                                                       95,71


                                                                                                                                                          2 - (98,57 %)     0.162       0.190     0.149                                  0.167
                                                            Timestamp - Sparsity (%)
                                                                                                                                                          3 - (97,50 %)     0.235       0.277     0.230                                  0.250
                  Figure. 7. The benefit of hybrid on new users
                                                                                                                                                          4 - (97,78 %)        0.286              0.337               0.262              0.274
Finally, it is also important to note the increasing trend in the
                                                                                                                                                          5 - (95,71 %)        0.275              0.326               0.265              0.291
average error as seen in fig. 7 which means that new users who
join the system late are less likely to receive good service than
those who join early.
Regarding classification accuracy for new users and new items,                                                                                5. BACKGROUND RESEARCH
our measurements show that the proposed hybrid system                                                                                         Trust has been the subject of investigation by many researchers in
outperforms the traditional one at all time instances.                                                                                        the past for alleviating issues connected with the use of sparse
                                                                                                                                              datasets in recommender systems. Singular Value Decomposition
has been proposed by other researchers and found to be better            [5] Sun X., Kong F., Ye S., A Comparison of Several Algorithms
than the standard collaborating filtering [5] for alleviating sparsity        for Collaborative Filtering in Startup Stage, In proc Networking
problems. Other approaches are based on the idea of removing                  Sensing and Control, IEEE, pp.25-28 (2005)
global effects and estimating the interpolation weights for each         [6] Bell R., Koren Y., Improved Neighborhood-based Collaborative
weighting factor for improving the accuracy of recommender                    Filtering, In proc IEEE International Conference on Data
systems [6]. Hybrid systems which combine content and                         Mining, pp.7–14 (2007)
collaboration have also been proposed in which various weights           [7] Melville P., Mooney R.L. , Nagarajan R., Content-Boosted
are set on the contribution of similarity [7]. In such an approach,           Collaborative Filtering for Improved Recommendations, In proc
the weight is dependent on the number of common items. In [15],               of Eighteenth national conf. of Artificial Intelligence, pp.187-
O’Donovan and Smyth study the effects of using trust models in                192, ISBN:0-262-51129-0 (2002)
the recommendation process and they demonstrate how it behaves           [8] Quercia D., Heiles S., Carpa L., B-trust: Bayesian Trust
against various attack scenarios. In [8], a solution for computing            Framework for Pervasive Computing. In proc 4th International
trust in CF systems has been investigated, but in the proposed                Conf. iTrust 2006. Lecture Notes in Computer Science
model the trustworthiness of the recommender is not taken into                (Vol.3986/2006). Springer, pp. 298-312 (2006)
account. In [9], in the work done by Lathia et.al., it is suggested      [9] Lathia N., Hailes S., Carpa L., Trust-Based Collaborative
that collaboration groups could better be formed by k-trusted                 Filtering, in proc IFIPTM, Springer, Vol. 263, pp.119-134,
neighbors rather than k-similar ones. In [16], the cold-start                 Trondheim, Norway (2008)
problem is approached using some idea based on machine                   [10] Resnick P., Varian H.R., Recommender Systems,
learning. Massa et al. in [23] has published a similar idea with              Communications of the ACM. 40(3), pp. 56-58 (1997)
ours, but based on different working hypothesis which requires           [11] Jøsang A., A Logic for Uncertain probabilities, International
that users would provide the trust statements themselves. To our              Journal of Uncertainty, fuzzi-ness & Knowledge based
knowledge trust has not been studied adequately so far as a                   systems,Vol.9, No.3 (2001)
solution to the cold-start problem. In addition, even though all the     [12] Pitsilis G., Marshall L.F., Modeling Trust for Recommender
studies performed can demonstrate the advantages of using trust,              Systems Using Similarity Metrics, in proc. IFIPTM, Springer,
they are merely static and do not capture the characteristics of the          Vol. 263, pp.103-118, Trondheim, Norway (2008)
community as it evolves. Since the cold-start problem is a time
                                                                         [13] Herlocker, J. L., Konstan, J. A., Terveen, L. G., and Riedl, J. T.
related issue we chose to demonstrate our proposed solution in a
                                                                              2004. Evaluating collaborative filtering recommender systems.
way that it can be shown if the advantage actually becomes                    ACM Trans. Inf. Syst. 22, 1 (Jan. 2004)
available when the system needs it the most.
                                                                         [14] Ziegler C-N., Towards Decentralized Recommender Systems,
                                                                              ISBN 363901149X, Vdm-Verlag, (2008)
6. CONCLUSION                                                            [15] O’Donovan J., Smyth B. ,Is trust Robust?: An Analysis of Trust-
We have proposed a hypothetical hybrid recommender system                     Based Recommendation, in Proc. of 11th International Conf. on
which uses trust to exploit the latent relationships between users            Intelligent User Interfaces IUI '06, pp.101-108, (2006).
and we have measured its performance. In this way, also                  [16] Xuan N. L., Thuc V., Trong D. L., Anh D. D., Addressing cold-
knowledge that exists at distant participants can be discovered and           start problem in recommendation systems, in Proc. of the 2nd
used by users who do not need to be known to each other. We                   international conf. on Ubiquitous information management and
used our modeling technique to build trust from existing evidence.            communication, Jan.31-Feb.01, Suwon, Korea, (2008).
The evaluation results show a significant benefit against the            [17] General consumer review site. http://www.epinions.com,
standard technique both in terms of coverage and in accuracy of
                                                                         [18] Golbeck J., Hendler J.. Inferring Trust Relationships in Web-
predictions. It is interesting to note that the benefit is more               Based Social Networks, ACM Transactions on Internet
distinguishable for new users and items which traditionally are               Technology, 6(4). (2006)
mostly affected by the sparsity problem. Furthermore, the higher
                                                                         [19] Josang A.,Bhuiyan T.,Xu Y.,Cox C., Combining Trust and
values achieved for F-score are indicative of improved ability in
                                                                              Reputation Management for Web-Based Services, in Proc. of the
protecting users from choosing products that they may not like.
                                                                              5th international conf. on Trust, Privacy and Security in Digital
With regard to the challenge of alleviating the cold-start problem,           Business,Turin, Italy, pp.90-99 (2008)
it can be seen that the benefits of using the trust enabled system
                                                                         [20] Victor P., Cornelis C., Teredesai A. M., De Cock M., Whom
are particularly visible early on when they are actually needed. A
                                                                              should I trust?: the impact of key figures on cold start
future challenge is to extend even further the period of time that
                                                                              recommendations , In proc. SAC '08: ACM symposium on
the benefit is received.                                                      Applied computing, pp. 2014-2018. (2008),
REFERENCES                                                               [21] Kamwar S. D, Schlosser M. T, Hector Garcia-Molina. “The
                                                                              EigenTrust algorithm for reputation management in P2P
[1]   Facebook Social Network service, http://www.facebook.com
                                                                              networks”. In: Proceedings of the 12th International Conference
[2]   General Consumer Review Site, http://www.dealtime.com                   on World Wide Web, Budapest, Hungary, 640-651 (2003)
[3]   Electronic Commerce Company, http://www.amazon.com                 [22] Shafer G., A Mathematical Theory of Evidence, Princeton
[4]   Maltz D., Ehrlish K., Pointing the Way: Active Collaborative            University Press (1976)
      filtering, In proc. of CHI-95,pp.202-209,New York, ACM Press       [23] Massa P., Avesani P.,Trust Aware Bootstrapping of
      (1995)                                                                  Recommender Systems, in proc. ECAI Workshop on
                                                                              Recommender Systems (2006).

</pre>