=Paper= {{Paper |id=Vol-1580/49 |storemode=property |title=Improving Neighborhood-Based Collaborative Filtering by a Heuristic Approach and an Adjusted Similarity Measure |pdfUrl=https://ceur-ws.org/Vol-1580/id49.pdf |volume=Vol-1580 |authors=Yasser El Madani El Alami,El Habib Nfaoui,Omar El Beqqali |dblpUrl=https://dblp.org/rec/conf/bdca/AlamiNB15 }} ==Improving Neighborhood-Based Collaborative Filtering by a Heuristic Approach and an Adjusted Similarity Measure== https://ceur-ws.org/Vol-1580/id49.pdf
Proceedings of the International Conference on Big Data, Cloud and Applications
Tetuan, Morocco, May 25 - 26, 2015



          Improving Neighborhood-Based Collaborative
       Filtering by A Heuristic Approach and An Adjusted
                       Similarity Measure

        Yasser El Madani El Alami                             El Habib Nfaoui                                     Omar El Beqqali
      Computer science department of                   Computer science department of                      Computer science department of
                  FSDM                                             FSDM                                               FSDM
       Sidi Mohammed Ben Abdellah                      Sidi Mohammed Ben Abdellah                          Sidi Mohammed Ben Abdellah
                 University                                      University                                         University
               Fez, Morocco                                     Fez, Morocco                                       Fez, Morocco
     yasser.elmadanielalami@usmba.ac.                   elhabib.nfaoui@usmba.ac.ma                         omar.elbeqqali@usmba.ac.ma
                    ma

    Abstract— “Collaborative filtering” is the most used approach in         researches such as MovieLens [5] [6], they are being extended
    recommendation systems since it provides good predictions.               to other domains such as digital libraries [7], e-learning [8], etc.
    However, it still suffers from many drawbacks such as sparsity
    and scalability problems especially for huge datasets which              Authors in [3] show that there are three main categories of
    consist of a large number of users and items. This paper presents        information filtering: content-based recommendations [9],
    a new algorithm for neighborhood selection based on two                  collaborative filtering and hybrid recommendations where
    heuristic approaches. The first of which is based on selecting           collaborative filtering methods are the most used in
    users who rated the same items as the active user called                 recommender systems [10]. They rely on users’ evaluation
    “intersection neighborhood” while the second one builds the              (ratings) to identify “useful” items to these users.
    neighborhood using all users who rated one item at least as the          Unfortunately, many typical drawbacks are noticed in
    active user called “union neighborhood”. In addition, we employ          collaborative filtering approaches which weaken thereafter the
    an adjusted similarity measure that combines Pearson                     quality of the recommendations such as sparsity and scalability
    correlation with a set-similarity measure (such as Jaccard               problems.
    similarity) as a correction coefficient for .accurate similarities
    among users. Finally, experiments using FilmTrust dataset show           Our work relies on using a heuristic approach in a
    that the proposed approaches give more predictions accuracy              preprocessing step for building users’ neighborhood which
    than the traditional collaborative filtering.                            relies on the well-known set operators; union and intersection.
                                                                             They induce a new ratings matrix with low dimension. This
        Keywords—Collaborative        filtering;   Neighborhood              ratings matrix is less sparse than the ratings matrix of the whole
    selection;Spatial complexity; Recommender system; Similarity             users. Therefore, this method leads to a minimum of time
    measure                                                                  consumption in the selection neighborhood phase. In parallel,
                                                                             we use a reformed similarity measure that combines the well-
    1 INTRODUCTION
                                                                             known Pearson correlation with set-similarity measures as
    Ever since the 90s, the amount of information has been                   adjustment coefficient which yields good results.
    increased in exponential way. The Internet has played a key
    role in information growth. Mobile devices such as smart                 This paper is organized as follows: in section 2 we give an
    phones and tablets also contribute to this continuous expansion          overview of the traditional collaborative filtering. In section 3
    of information plethora. Thus, users are continually faced with          we mention some recent works conducted in collaborative
    information overload. It becomes difficult for them to                   filtering field. Section 4 describes our proposals. In section 5,
    distinguish relevant information from noise. In order to address         we present the experiments and evaluation results of our
    this problem, there have been a great interest in                        proposals. At the end, we give some perspectives, and a
    recommendation. According to [1] recommendation systems                  conclusion.
    have been considered as an effective means to reduce                     2 BACKGROUND
    complexity in information retrieval. They promise to
    personalize the request based on the user’s interest in a smart          The term of collaborative filtering (CF) was introduced by
    way [2]. As stated by [3] recommendation system helps users              David Goldberg in [11] where he proposed a mail system
    to deal with information overload and provides personalized              called Tapestry that filters documents based on users’ interest
    recommendations, content and services to them. It suggests the           in order to be used by other people. Collaborative filtering is
    appropriate items for each user according to his/her interests.          based on mutual aid of users who share similar tastes and
    Although, recommendation systems are largely used in both e-             preferences to recommend the suitable items. According to
    commerce applications such as Amazon [4] and academic                    [12], collaborative filtering relies on the following assumption:
                                                                             if users X and Y rate n items similarly or have similar




                                                                         1
                                                                                                                                            16
behaviors, then, in the future, they will act (rating or behavior)                  Default rating: to set a same value for all unrated items.
on other cases similarly. As a result, CF based systems can
predict a user’s rating (or behavior) for an unknown item [11]                      Pre-processing using average: to set an average rating
or create a top-N list of recommended items for a target user                        based on user’s votes for the missing rating-matrix
(called active user) [13]. It is worth noting that the first work                    entry ̅
using CF was presented by Malone in [14] where he proposed                2.1.1.2 Neighborhood formation
stereotypes to build user models and use them to recommend
                                                                               The second step consists of measuring similarity between
relevant books to each user.
                                                                          the other users. They are several similarity algorithms such as
   According to [15], we can distinguish between classes of               Pearson correlation, mean-squared difference [22] and
CF algorithms: memory-based algorithm and model-based                     Spearman correlation [23]. The most commonly used algorithm
algorithms. In what follows we focus on the memory based                  is the Pearson correlation. In fact, it has become a standard way
approach                                                                  of calculating correlation [19]. Using Pearson correlation,
                                                                          similarity between user ua and ub is calculated with the
2.1     Memory based algorithm                                            following formula:
Memory based approach builds predictions based on the whole
                                                                                                          ∑               (         )(       )
set of ratings that users assigned to items before. Previous                              ,   =                                                      (1)
ratings are grouped in a matrix referred to as ratings matrix. It’s                                   ∑           (             ) ∑      (       )
the pillar input in this approach. It was the earliest approach
adopted by many commercial systems thanks to its easiness                 where n is the cardinal of the set of items, rai is the rating given
and effectiveness [16].                                                   by user a to item j and is the average rating given by user a
                                                                          for all the items he rated. As an output, similarity process
                  Table 1 Example of ratings matrix
                                                                          returns a user similarity matrix which determines correlation
                                                                          between pairs of users. Thus, building similarity between users
    Items    Item 1     …      Item j        …          Item n            allows forming the requested neighborhood. Two techniques
Users                                                                     have been employed [24]:
User 1       1                 2             2          3
                                                                                    Threshold-based: user is considered as neighbor when
…                              2                        1                            his/her user similarity exceeds a given threshold [25].
User s                  1      ?                                                    K nearest users where k is given as input. Also, it can
…            3                               3          5                            be computed for each user as proposed in [26].

User p                  1      4                                          2.1.1.3 Recommendation generation
                                                                               This phase relies on generating predicted rating of user s to
As presented in table 1 above, also called ratings matrix, the                 item i. It’s calculated as aggregation of similarity between
cell rsj refers to the rating given by user s to item j (on 1-5                the active user and his neighborhood. It relies on both
rating scale). In most cases, this ratings’ matrix is typically                ratings matrix (input) and similarity matrix.
sparse [17] as most users do not rate viewed items regularly.
Therefore, [18] argued that the sparsity can be an issue that can                ,   =            ∈           ,           (2)
lead to weak recommendations. Besides, the most popular
algorithm in memory based is neighbor-based algorithm which                   Various aggregation functions are employed in predictions
predicts ratings based on either users who are similar to the                 where the most used one is calculated as the weighted
active user or similar items to the requested item. Generally,                average of neighbors’ ratings using their similarities as
According to [19] there are three steps into processing a                     follows:
recommendation based on CF system: i) Representation, ii)                                     ∑       (   ,               )
Neighborhood formation, iii) Recommendation generation.                          ,   = ̅ + ∑                                  (3)
                                                                                                                      ,

Neighbor-based approach is mainly divided in two analogous                K represents the size of selected neighborhood.
categories: users-based collaborative filtering [15] and item-
based collaborative filtering [20]. For example, in what                  Therefore, based on computed predictions, recommender
follows, we detail the User-Based CF.                                     system may select the top-N items as the recommendations list
                                                                          of unknown or new items that the active user has never seen
2.1.1    User-based CF (UBCF)                                             before.
    User based recommendation relies on users similarity to the
active user. In fact, it builds prediction and recommendation             2.2    Performance measures
using the correlation between the active user and each other.             Performance measures are the result of a step of monitoring a
                                                                          proposed method or algorithm in real situation or
2.1.1.1 Representation                                                    approximating the reality with reliable data. In
    The first step in UBCF consists on building a rating matrix           recommendation system, a great research effort has been made
and assigning values to the unrated items to fill the porous              to deal with this influential task such as presented in [27]. In
ratings matrix. Two processes [21] can overcome sparsity and              our case we are interested by prediction performance measures.
improve recommendation accuracy:




                                                                      2

                                                                                                                                                           17
In this area, many indicators are used to measure the system           similarity measure. In the literature, almost all works are based
performance. In most cases, they tend to evaluate the system           on the well-known Pearson correlation measure. As presented
accuracy. They measure the precision of computed predictions           in formula (2), Pearson correlation measure doesn’t take into
comparing to the real user ratings. A case in point is Mean            account other decisive which provide meaningful information
Absolute Error (MAE). In fact, MAE is a common way used to             of how users’ preferences are different. Sparsity is another
measure accuracy based on statistical metric. It calculates the        consistent problem which contributes to generating incorrect
average absolute difference between predicted ratings and real         recommendations. In fact, only a small number of the whole
ones:                                                                  items are rated then the matrix rating becomes sparse. In
             ∑( , )
                                                                       addition, using a huge dataset required more time for
                      ,   ,
         =                    (4)                                      computing similarities among users in order to build an
                                                                       effective neighborhood for the active user. Consequently,
In the formula above , is the predicted rating for user s to           combining these factors increases mainly the margin of error
item i, , is the real rating given by user s to item i and N           and reduces the confidence interval which leads to inaccurate
corresponds to the number of predicted ratings calculated              recommendations.
during the test phase.
                                                                       4 PROPOSED APPROACH
In [27] authors argue that using different evaluation metrics
leads to different conclusion concerning the recommendation            In order to limit the problems of sparsity and time computing
system performance. Thus, most researches choose a single              problems, we propose a preprocessing step which relies on a
evaluation measure in order to demonstrate the effectiveness of        heuristic approach of neighborhood selection (figure 1).
their algorithms.
After giving a detailed overview of the background, and
presenting the broad outlines of collaborative filtering, in the
next section, we present some recent works and trends done in
this area in order to overcome drawbacks and illnesses of the
traditional approach.
3 RECENT WORKS
    In the last decade, collaborative filtering approach
motivated a larger number of works adding in each one an
original concept and then opening a new perspective in order to
deal with the problem of recommendation. Clustering method
is one of the extensively used concepts in CF. One of the                       Figure 1 the proposed process of collaborative filtering
earliest works that have been done in this area was presented in
[28] where the authors argued that clustering improves the             It can be done with two methods based on natural operators’
performance of recommendation. [29] Proposes to group users’           sets without computing similarities among users. The first
profiles into clusters of similar items and compose the                method is the intersection selection. It’s based on selecting
recommendation list of items that match well with each cluster.        users who rate the same item as the active user. The second one
Authors in [30] develop Eigentaste 5.0 recommender system              is based on the union operator. It’s based on building the
that dynamically arranges the order of recommended items by            neighborhood with users who rated one item in common at
integrating user clustering with item clustering. In [31] the          least. Obviously, time computing is reduced because of the
author proposes a new probabilistic neighborhood-based                 reduction of the number of computed users’ similarities. In fact
approach as an improvement of the standard k-nearest neighbor          our approach focuses on selecting neighbors who are likely to
algorithm. It’s based on classical metrics of dispersion and           be reliable to the active user before starting similarities
diversity as well as on some newly proposed metrics. The               computation phase which is time-consuming if we compute the
author also proposes the concept of unexpectedness in                  similarities for all system’s users. As a result, the new ratings
recommender systems. He also fully uses it by suggesting               matrix is smaller than the ratings matrix of the system used in
                                                                       the traditional collaborative filtering methods, which leads to
various mechanisms for specifying the expectations of the
                                                                       less sparseness in the matrix.
users. Moreover, he proposes a recommendation method for
providing the users with non-obvious but high quality                          A. Intersection neighborhood
personalized recommendations that fairly match their interests.
This method is based on specific metrics of unexpectedness. In             We call the first method of neighborhood selection the
[32] the authors propose an adapted normalization technique            intersection neighborhood. It relies on selecting neighbors who
called mutual proximity in the nearest neighbor selection phase        rate the same items. Actually, it’s rare to find two users who
to rescale the similarity space and symmetrize the nearest             rate the same items. So, in order to deal with this point, we use
neighbor relation. They prove that removing hubs and                   a threshold of acceptance which corresponds to a minimum
incorporating normalized similarity values into the neighbor           number of co-rated items. In what follows, we present the
weighting step leads to increased rating prediction accuracy.          adopted algorithm:

   One of the major factors in collaborative filtering that            1. For each active user.
greatly influences the recommendation accuracy is the selected




                                                                   3

                                                                                                                                           18
2. Extract the list Ia of items that the active user Ua rated          4. Select the target item whose rating value is going to be
    before. Ia={i1, i2…,ip} and card(Ia)=p.                                predicted. For instance, as shown in the example below
                                                                           (figure 5) we select the item I6.
3. Select users who rated the same list of items Ia as presented
    in the figure below (figure 2) or having more than the
    fixed threshold noted “TA”.
4. Select the target item whose rating value is going to be
    predicted. As shown in the example below, we select the
    item I6.




                                                                                          Figure 4 Basic ratings matrix

                                                                          As presented in this example (figure 5), the neighborhood
                                                                       contains two users U1 and U3. The new ratings matrix is:


                    Figure 2 Basic ratings matrix

As presented in this example (figure 3), the neighborhood
contains two users U1 and U3and the new adopted ratings
matrix is:

                                                                                         Figure 5 Derived ratings matrix

                                                                       5. Fill up the empty cells with the average of ratings of each
                                                                           user. As a direct result, the new ratings matrix contains
                                                                           more users than the intersection algorithm but more empty
                                                                           cells.
                                                                       6. Compute the similarity between the selected users and the
                   Figure 3 Derived ratings matrix                         active user.
                                                                       7. Select the top n similar users N={u1, u2, … un}
5. Fill up the empty cells with the average of ratings of each
    user. As we can see, the new ratings matrix contains less          8. Generate the recommendation based on the selected
    empty cells than the first one.                                       neighborhood.
6. Compute the similarity between the selected users and the                   C. Similarity measure
    active user using the new matrix of ratings.
                                                                           In order to reduce the impact of the fallacious similarity on
7. Select the top n similar user N={u1, u2, … un}                          the computed recommendations, we propose to a modified
                                                                           version of Pearson correlation. In [33], authors argue that
8. Generate the recommendation based on the selected                       adding associated parameters of users x and y improve the
   neighborhood.                                                           similarity accuracy which lead to good predictions. Thus,
        B. Union neighborhood                                              the new similarity measure is presented as follows:
The second method of neighborhood selection is called the                                           =S      ∗             (5)
union neighborhood. . It relies on selecting neighbors who rate
one common item at least. The adopted algorithm is presented               Where Corrp represents the traditional Pearson correlation
in what follows:                                                       and S      represents Jaccard coefficient used as adjustment
                                                                       coefficient:
1. For each active user
                                                                                                S     =           (6)
2. We extract the list Ia of items that the active user Ua rated
   before. Ia={i1, i2…,ip} and card(Ia)=p                                  where
3. Select users who rated one item of the list Ia at least as                 a=|X∩Y| represents the number of attributes (the rated
    presented in the figure 4 below.                                           items) which are present in user X and user Y.
                                                                              b=|X-Y| represents the number of attributes (the rated
                                                                               items) which are present in user X and not in user Y.




                                                                   4

                                                                                                                                  19
          c=|Y-X| represents the number of attributes (the rated                          e.   Repeat step d until predicting all the missing
           items) which are present in user Y and not in user X.                                values
5 EXPERIMENTS AND RESULT                                                                   f.   Computing the MAE for each neighborhood
                                                                                                size by comparing the predicted values with
5.1        FilmTrust DataSet                                                                    the observed ones.
      For experiments we use the FilmTrust project dataset [34].                4.   Repeating this computation (from step 2) three times
      It is an academic research project being run by Jennifer                       and then we give the average of the MAE for each user
      Golbeck1. It’s a movie recommendation website where                            and neighborhood size.
      users can rate and review movies. Users can give their
      opinion using a quantitative value on a rating scale from           5.3        Results and analysis
      0.5 to 4 stars where 0.5 means bad and 4 means excellent.           5.3.1     Experiment 1
      The data is stored as semantic web annotations based on                  We start our test with comparing the traditional
      RDF2 and FOAF3. It integrates semantic web-based on                      collaborative approach method (TCFM) with the union
      social networks with movie ratings so as to compute                      neighborhood selection method (UNSM) by following the
      predictive movie recommendations. The collected dataset                  steps presented before. Even though the variation behavior
      consists of 1856 users, 2092 movies and 759922 ratings.                  of the union method is not regular, we can see (figure 6)
      Thus, around 80.4% of the global ratings matrix is empty.                that the union neighborhood selection method provides
      It means that FilmTrust dataset represents a real situation              good results since the computed MAE is less than the
      of sparsity problem.                                                     traditional collaborative filtering one.
5.2      Experiments steps
                                                                            MAE                      Union Neighborhood Selection
      Our test consists of two main experiments. The first
                                                                                                     Traditionnal collaborative Filtering
      experiment focuses on comparing the traditional approach
                                                                                 0,78
      with the union neighborhood selection method using the
                                                                                 0,77
      adjustment coefficient presented before. The second
                                                                                 0,76
      experiment focuses on comparing the comparing the
                                                                                 0,75
      traditional approach with intersection neighborhood
                                                                                 0,74
      selection method using the same adjustment coefficient.
                                                                                 0,73
      The experiments respect the steps below:                                   0,72
                                                                                 0,71
      1.   From the dataset DS, for each user we build our ratings
                                                                                  0,7
           matrix based on the definition presented below.
                                                                                 0,69
      2.   We randomly select 30% of the ratings and set the                     0,68
           value of those ratings to POSITIVE_INFINITY. In                       0,67
           fact, we use this value to distinguish between the                    0,66
           empty cells of the global ratings matrix which is                     0,65
           represented by null value and the modified ones.                      0,64
           Therefore we build two sets:                                                   10    20    30     40     50     60    70      80    90
      Set TG represents the training set (70% of the whole                                                                            Neighborhood size
      dataset) and Set T which represents the set of test with:
      T=DS-TG
                                                                                        Figure 6 MAE comparison between TCFM and UNSM
      3.   For each user from the set DS.
               a.   We select users according to the employed             5.3.2      Experiment 2
                    method (intersection or union)                                   The second experiment compares the traditional
                                                                                collaborative filtering method (TCFM) with the
               b.   We build the rating matrix TG by filling up                 intersection neighborhood selection method (INSM) by
                    the missing ratings with the average ratings of             following the steps outlined beforehand. In addition, in
                    each user.                                                  order to take user y as a candidate neighbor of the active
                                                                                user x in intersection approach, we set 10 as a threshold
               c.   Use different size of neighborhood from 10
                                                                                of co-rated items. Then, user ‘y’ will be selected if
                    users to 100.
                                                                                he/she has10 co-rated items at least.
               d.   Employ the prediction formula to predict the
                    missing values.


1
 https://www.cs.umd.edu/~golbeck/
2
  http://www.w3.org/RDF/
3
  http://www.foaf-project.org/original-intro




                                                                      5

                                                                                                                                                    20
   MAE                   Traditional Collaborative Filtering
                                                                                    As a result of these neighborhood selection methods, we
                         Intersection Neigborhood Selection
                                                                                    reduce the rating-matrix dimension which leads, on one
      0,78                                                                          hand, to less sparseness in the induced matrix, and, on the
      0,77                                                                          other hand, to minimizing the consumed time in
      0,76                                                                          neighborhood selection phase. In addition, the prediction
      0,75
                                                                                    accuracy is improved.
      0,74                                                                          Despite this, the two approaches are not efficient for cold
      0,73                                                                          start problem especially for intersection approach which
      0,72                                                                          needs a threshold of ratings to provide good
      0,71                                                                          recommendations. As future work, we will investigate
      0,70                                                                          incorporating social network data in order to deal with this
      0,69                                                                          problem. In fact, social networks offer many opportunities
      0,68                                                                          for recommendations since people generally use their social
      0,67                                                                          networks to obtain reliable and useful information.
      0,66
                                                                                                             REFERENCES
      0,65
                                                                                [1] A. K. Joseph, "Introduction to recommender systems: Algorithms and
      0,64                                                                          Evaluation," ACM Transactions on Information Systems, vol. 22, no. 1,
              10    20     30    40    50    60    70     80   90    100            pp. 1-4, 2004.
                                                        Neighborhood size       [2] Z. Zied, "Modèle multi-agents pour le filtrage collaboratif de
                                                                                    l'information," Université du Québec, Montréal, 2010.
         Figure 7 MAE comparison between TCFM and INSM                          [3] A. Gediminas and T. Alexander, "Towards the Next Generation of
                                                                                    Recommender Systems:A Survey of the State-of-the-Art and Possible
                                                                                    Extensions," IEEE Transactions on Knowledge and Data Engineering,
    Even though the variation behavior of the intersection                          vol. 17, no. 6, p. 734–749, 2005.
method is not regular (figure 7), we see that the intersection
                                                                                [4] L. Greg, S. Brent and Y. Jeremy, "Amazon.com Recommendations:
neighborhood selection method (INSM) performs better than                           item-to-item collaborative filtering," Internet Computing, IEEE, vol. 7,
the traditional collaborative filtering method (TCFM).                              no. 1, pp. 76 - 80, 2003.
5.3.3    Comparative analysis                                                   [5] M. Bradley N., A. Istvan, L. Shyong K., K. Joseph A. and R. John,
                                                                                    "MovieLens Unplugged: Experiences with an Occasionally Connected
The last figure (figure 8) shows a comparison of the MAE                            Recommender System," in the 8th international conference on Intelligent
between the traditional collaborative method and the two                            user interfaces, Miami, Florida, 2003.
proposed methods: intersection and selection neighborhood                       [6] B. N. Miller, "Toward a Personal Recommender System," Minneapolis,
disregarding the size of the neighborhood. As we can see,                           USA, 2003.
union neighborhood selection method (UNSM) gives the best                       [7] A. Smeaton and J. Callan, "Personalization and recommender systems in
result.                                                                             digital libraries," International Journal on Digital Libraries, vol. 5, no.
                                                                                    4, pp. 299-308, 2005.

      0,760                                                                     [8] N. Manouselis, D. Hendrik, R. Vuorikari, H. Hummel and R. Koper,
                                                                                    "Recommender Systems in Technology Enhanced Learning," in
      0,750                                                                         Recommender Systems Handbook, Spring US, 2011, pp. 387-415.
      0,740                                                                     [9] L. Pasquale, d. G. Marco and S. Giovanni, "Content-based
                                                                                    Recommender Systems: State of the Art and Trends," in Recommender
      0,730
                                                                    INSM            Systems Handbook, Spring US, 2011, pp. 73-105.
      0,720                                                                     [10] P. L. Joel, L. Nuno, N. M. María, A. F. Ana and M. Constantino, "A
                                                                    UNSM
      0,710                                                                          hybrid recommendation approach for a tourism system," Expert Systems
                                                                    TCFM             with Applications, vol. 40, no. 9, p. 3532–3550, 2013.
      0,700
                                                                                [11] D. Goldberg, D. Nichols, B. M. Oki and D. Terry, "Using collaborative
      0,690                                                                          fil- tering to weave an information tapestry," Communications of the
      0,680                                                                          ACM, vol. 35, no. 12, pp. 61-70, 1992.
                                  MAE                                           [12] K. Goldberg, T. Roeder, D. Gupta and C. Perkins, "Eigentaste: A
                                                                                     Constant Time Collaborative Filtering Algorithm," Information
                                                                                     Retrieval, vol. 4, no. 2, pp. 133-151, 2001.
                   Figure 8 Synthesis of all approaches                         [13] M. Deshpande and G. Karypis, "Item-based top-N recommendation
                                                                                     algorithms," ACM Transactions on Information Systems (TOIS), vol. 22,
6 CONCLUSIONS AND FUTURE WORKS                                                       no. 1, pp. 143 - 177, 2004.
         In this paper, we proposed a preprocessing step which                  [14] T. W. Malone, K. R. Grant, F. A. Turbak, S. A. Brobst and M. D. Cohen,
                                                                                     "Intelligent information-sharing systems," Communications of the ACM,
   relies on two heuristic methods: the union neighborhood
                                                                                     vol. 30, no. 5, pp. 390-402, 1987.
   selection method and the intersection neighborhood
                                                                                [15] J. S. Breese, D. Heckerman and C. Kadie, "Empirical analysis of
   selection. Both of them are combined with a reformed                              predictive algorithms for collaborative filtering," in the Fourteenth
   Pearson correlation similarity where we use a set-similarity                      conference on Uncertainty in artificial intelligence, San Francisco, USA,
   measure as a correction factor (Jaccard similarity). As                           1998.
   presented before, the two methods provide acceptable                         [16] M. D. Ekstrand, J. T. Riedl and J. A. Konstan, "Collaborative Filtering
   results comparing to the traditional collaborative filtering.




                                                                            6


                                                                                                                                                            21
     Recommender Systems," Foundations and Trends in Human-Computer                   [28] B. M. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Recommender
     Interaction, vol. 4, no. 2, pp. 81-173, 2011.                                         Systems for Large-scale E-Commerce: Scalable Neighborhood
[17] P. Melville and V. Sindhwani, "Recommender Systems,"                    in            Formation Using Clustering," in Proceedings of The fifth international
     Encyclopedia of machine learning, Springer US, 2010, pp. 829-838.                     conference on computer and information technology, 2002.
[18] E. Vozalis and K. Margaritis, "Analysis of Recommender Systems'                  [29] M. Zhang and N. Hurley, "Novel item recommendation by user profile
     Algorithms," in HERCMA 2003, Athens, Greece, 2003.                                    partitioning," in The 2009 IEEE/WIC/ACM International Joint
                                                                                           Conference on Web Intelligence and Intelligent Agent Technology, 2009.
[19] B. Sameer, "Recommender System Algorithms," Toronto, Canada, 2008.
                                                                                      [30] T. Nathanson, E. Bitton and K. Goldberg, "Eigentaste 5.0: constant-time
[20] B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative             adaptability in a recommender system using item clustering," in In
     filtering recommendation algorithms," in the 10th international                       Proceedings of the 2007 ACM conference on Recommender systems,
     conference on World Wide Web, New York, USA, 2001.                                    2007.
[21] E. Vozalis and K. Margaritis, "Analysis of Recommender Systems'                  [31] P. Adamopoulos, "On Discovering non-Obvious Recommendations:
     Algorithms," in Proceedings of the 6th Hellenic European Conference                   Using Unexpectedness and Neighborhood Selection Methods in
     on Computer Mathematics and its Applications (HERCMA-2003),                           Collaborative Filtering Systems," in Proceedings of the 7th ACM
     Athens, Greece, 2003.                                                                 international conference on Web search and data mining, New York,
[22] U. Shardanand and P. Maes, "Social information filtering: algorithms for              NY, USA, 2014.
     automating “word of mouth”," in the SIGCHI Conference on Human                   [32] P. Knees, D. Schnitzer and A. Flexer, "Improving Neighborhood-Based
     Factors in Computing Systems, New York, USA, 1995.                                    Collaborative Filtering by Reducing Hubness," in Proceedings of
[23] M. Kendall and J. D. Gibbons, Rank Correlation Methods, Oxford:                       International Conference on Multimedia Retrieval, New York, NY,
     Oxford University Press, 1990.                                                        USA, 2014.
[24] S. Gong, "A Collaborative Filtering Recommendation Algorithm Based               [33] E. M. E. A. Yasser, N. El Habib and E. B. Omar, "An Adjustment
     on User Clustering and Item Clustering," Journal of Software, vol. 5, no.             Similarity Measure for Improving Prediction In Collaborative Filtering,"
     7, pp. 745-752, 2010.                                                                 in The 2nd International Workshop on Software Engineering and
[25] J. L. Herlocker, J. A. Konstan, L. G. Terveen and J. T. Riedl, "Evaluating            Systems Architecture, Tetouan, Morocco, 2014.
     collaborative filtering recommender systems," ACM Transactions on                 [34] [Online]. Available: http://trust.mindswap.org/FilmTrust/about.shtml.
     Information Sytems, vol. 22, no. 1, p. 5–53, 2004.                                    [Accessed 1 September 2013].
[26] N. K. Lathia, "Evaluating Collaborative Filtering Over Time," London,
     UK, 2010.
[27] A. Gunawardana and G. Shani, "A survey of accuracy evaluation metrics
     of recommendation tasks," Journal of Machine Learning Research, vol.
     10, p. 2935–2962, 2009.




                                                                                  7


                                                                                                                                                                  22