=Paper= {{Paper |id=None |storemode=property |title=How Similar is Rating Similarity to Content Similarity? |pdfUrl=https://ceur-ws.org/Vol-910/paper6.pdf |volume=Vol-910 |dblpUrl=https://dblp.org/rec/conf/recsys/BaskayaA12 }} ==How Similar is Rating Similarity to Content Similarity?== https://ceur-ws.org/Vol-910/paper6.pdf

How Similar is Rating Similarity to Content Similarity?

Osman Başkaya Tevfik Aytekin
Department of Computer Engineering Department of Computer Engineering
Bahçeşehir University Bahçeşehir University
İstanbul, Turkey İstanbul, Turkey
osman.baskaya@computer.org tevfik.aytekin@bahcesehir.edu.tr

ABSTRACT they make in predicting the ratings of users for items. Al-
The success of a recommendation algorithm is typically mea- though accuracy of predictions is an important aspect of
sured by its ability to predict rating values of items. Al- recommender systems, it is not the only one. Recently, in-
though accuracy in rating value prediction is an important creasing the diversity of recommendation lists have gained
property of a recommendation algorithm there are other attention among researchers in the field [8, 2]. To be able
properties of recommendation algorithms which are impor- to recommend a diverse set of items to a user is important
tant for user satisfaction. One such property is the diversity with respect to user satisfiability because a recommendation
of recommendations. It has been recognized that being able list consisting of one type of item (e.g., movies only from the
to recommend a diverse set of items plays an important role same genre) might not be very satisfactory even if the ac-
in user satisfaction. One convenient approach for diversifi- curacy of rating prediction is high. But here there is one
cation is to use the rating patterns of items. However, in issue. We need to define a metric for measuring the diver-
what sense the resulting lists will be diversified is not clear. sity of a recommendation list. Then we can try to optimize
In order to assess this we explore the relationship between the recommendation list based on this metric. One possible
rating similarity and content similarity of items. We discuss metric for measuring the diversity of a recommendation list
the experimental results and the possible implications of our of a particular user is described in [2]. This metric measures
findings. the diversity as the average dissimilarity of all pairs of items
in a user’s recommendation list. Formally, it can be defined
as follows:
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information 1 X X
Search and Retrieval D(R) = d(i, j), (1)
N (N − 1) i∈R
j∈R,j6=i

General Terms where R is the recommendation list of a user and N = |R|.
d(i, j) is the dissimilarity of items i and j which is defined
Experimentation, Measurement as one minus the similarity of items i and j.
We think that average dissimilarity is a reasonable way to
Keywords measure the diversity of a list of items. However, the impor-
diversity, recommender systems, collaborative filtering tant part is how to define d(i, j), i.e., the dissimilarity of two
items which is unspecified in equation (1). The problem is
not to choose a similarity metric such as Pearson or cosine.
1. INTRODUCTION The problem is whether we can use the rating patterns (vec-
Recommender systems help users to pick items of interest tors) of items in order to measure their similarity. And if we
based on explicit or implicit information that users provide use these rating patterns, in what respect the recommenda-
to the system. One of the most successful and widely used tion lists will be diversified? For example, if it is a movie
technique in recommender systems is called collaborative fil- recommender system, will the recommendation lists contain
tering (CF) [7]. CF algorithms try to predict the ratings of more movies from different genres or will the content of the
a user based on the ratings of that user and the ratings of movies get diversified?
other users in the system. The performance of collabora- In order to answer these questions we will compare rating
tive filtering algorithms is typically measured by the error similarity with two types of content similarities which we
will define below. We hope that the results we discuss will
shed some light on these types of questions and stimulate
Permission to make digital or hard copies of all or part of this work for discussion on diversification.
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to 2. RELATED WORKS
republish, to post on servers or to redistribute to lists, requires prior specific In hybrid recommendations content information is used in
permission and/or a fee. order to increase the accuracy of rating predictions especially
Copyright is held by the author/owner(s). Workshop on Recommendation
Utility Evaluation: Beyond RMSE (RUE 2012), held in conjunction with for items whose ratings are too sparse. For example [3, 5,
ACM RecSys 2012. September 9, 2012, Dublin, Ireland. 6] use content information collected from sources such as

27
Wikipedia and IMDB in order to improve the accuracy of two matrices: one is the Movie-User matrix which holds
rating predictions. These works indirectly show that there is the ratings of users on movies and the other is the Movie-
indeed some positive relationship between rating similarity TFIDF matrix which holds the tf-idf weights for each docu-
and content similarity. Otherwise, it was not possible to ment. For evaluation we use the following methodology. For
increase the prediction accuracy using content information. each movie we find the most similar 100 movies using the
Another paper which comes close to our concerns is [1] Movie-User matrix (rating neighborhood) and the most sim-
Here, the authors propose a new algorithm for diversifying ilar 100 movies using Movie-TFIDF matrix (content neigh-
recommendation lists. Their algorithm uses rating patterns borhood). We then find the number of common items in
of movies for diversification. They evaluate the results by these two neighborhoods. It turns out that on average there
looking at how well the recommendation lists are diversified are 14.74 common movies in the two neighborhoods. If we
with respect to genre and movie series they belong. They generate the neighborhoods randomly this value turns out
report that the resulting lists’ diversity increase in both re- to be around 2.80. Randomization tests show that this dif-
spects (genre and series). However, to the best of our knowl- ference is significant (p < 0.01).
edge there are no direct comparisons between rating and We run the same experiment with different neighborhood
content similarity. In this paper we examine directly these sizes (20 and 50) but the percentages of the number of com-
two types of similarities. mon items in the rating and content neighborhoods turn
out to be similar to the percentages we get when we use a
3. ITEM CONTENT GENERATION neighborhood of size 100.
In our experiments we use Movielens1 (1M) data set. In We also test whether there is a relationship between the
order to compare movies’ rating patterns to their contents number of ratings and the correspondence between rating
we first need to generate movie content information. We use and content similarity. To see this we find the rating and
two sources of information to this end. One source of content content neighborhoods of those movies which have similar
information comes from Wikipedia articles corresponding to number of ratings. To do this we divide the movies into rat-
movies in the Movielens dataset. The other source of con- ing intervals according to the number of ratings they have:
tent information comes from genre information which are movies which have ratings between 1-100, between 101-200,
provided in the dataset. Details of content generation are and so on. If an interval has less than 20 movies, we merge
given below. it with the previous one in order to increase the significance
of the results. Figure 1 shows the average number of com-
3.1 Content Generation from Wikipedia mon items in the rating and content neighborhood sets of
The Movielens dataset contains 3883 distinct movies and movies as a function of rating intervals. Interestingly, Fig-
6040 users. Some of these movies are not rated by any user. ure 1 shows a clear linear correlation, i.e., as the number of
Also some of the movies have no corresponding entries in ratings increases the number of common items in the con-
Wikipedia. After discarding these movies we are able to tent and rating neighborhood of movies also increases. One
fetch 3417 (approximately 88% of all movies) movie articles possible explanation of this positive linear correlation might
from Wikipedia. be this. Generally, there is a positive relationship between
In this work we only use the text of each Wikipedia article the number of ratings and the popularity of a movie. This
(we do not use link structure or category information of means that popular movies receive ratings from many dif-
articles). The text of a Wikipedia article consists of parts ferent people with different tastes. Hence the rating pat-
such as “Plot”, “Cast”, and “Release”. We do not include terns of popular movies reflect a diverse set of characteris-
“References” and “See also” parts of the text since they may tics. Wikipedia movie articles also have rich contents reflect-
contain information which is unrelated to the content of the ing different characteristics of movies. This might explain
movies. After extracting the text of each document we apply why a movie’s rating neighborhood approaches to its content
some basic preprocessing steps such as stemming and stop- neighborhood as the number of ratings increase.
words removal. We use a vector space model to represent In the next set of experiments our aim is to understand
text documents. the relationship between movie rating patterns and movie
genres provided in the Movielens dataset. Genre keywords
3.2 Genre Information provide limited information compared to Wikipedia articles.
As a second source of content we use the genre keywords Because Wikipedia articles contain terms that give informa-
(such as adventure, action, comedy, etc.) provided by the tion not only about the genre of a movie but also about the
Movielens dataset. Each movie in the dataset is associated director, starring, musical composition, etc.
with one or more genre keywords. We define the genre sim- In order to measure the relationship between movie rating
ilarity between two movies using the Jaccard metric given patterns and genres we applied a similar methodology. For
below: each movie m we find the most similar 100 movies using
|Gi ∩ Gj | the Movie-User matrix (that is the rating neighborhood)
J(i, j) = (2) and find the Jaccard similarity (as defined in equation 2)
|Gi ∪ Gj |
between movie m and movies in its rating neighborhood.
where Gi and Gj are genre sets of items i and j. The average Jaccard similarity value turns out to be 0.43.
If we generate the rating neighborhood randomly we find a
4. EXPERIMENTS Jaccard value around 0.17. Randomization tests show that
In the first set of experiments we try to understand the this difference is significant (p < 0.01).
relation between movie rating patterns and content gener- We also test whether there is a relationship between the
ated from the corresponding Wikipedia articles. We have number of ratings and genre similarity. Similar to the ex-
1 periment we described above we divided the movies into rat-
http://www.grouplens.org/node/73

28
5. CONCLUSION
We should note at the outset that the conclusions pre-
sented here are not conclusive. Different experiments on
different datasets and with different item types need to be
done in order to drive more firm conclusions. However, we
hope that these experiments and results will stimulate dis-
cussion and further research.
In this work we examined the relationship between rating
similarity and content similarity of movies in the Movielens
dataset. We examined two kinds of content: one of them
is the tf-idf weights of movie articles in Wikipedia and the
other is the genre keywords of movies provided by the Movie-
lens dataset.
We found that to a certain degree there is a similarity be-
tween rating similarity and Wikipedia content similarity and
also between rating similarity and genre similarity. However,
we leave open to discussion the magnitude of these similari-
Figure 1: Average number of common movies as a ties. We also found that as the number of ratings of a movie
function of rating intervals. increases its rating similarity approaches to its Wikipedia
content similarity whereas its rating similarity diverges away
from its genre similarity.
According to these results if diversification is done based
on the rating patterns of movies then the recommendation
lists will likely be diversified with respect to the content
of movies to some extent. So, if no content information is
available or it is difficult to get it, it might be useful to use
rating patterns to diversify the recommendation lists.
To this analysis we plan to add latent characteristics of
items generated by matrix factorization methods [4]. We
plan to explore the correspondences among similarities de-
fined over rating patterns, contents, and latent characteris-
tics of items.

6. REFERENCES
[1] R. Boim, T. Milo, and S. Novgorodov. Diversification
and refinement in collaborative filtering recommender.
In CIKM, pages 739–744, 2011.
Figure 2: Average Jaccard index as a function of [2] N. Hurley and M. Zhang. Novelty and diversity in
rating intervals. top-N recommendation - analysis and evaluation. ACM
Trans. Internet Techn, 10(4):14, 2011.
[3] G. Katz, N. Ofek, B. Shapira, L. Rokach, and G. Shani.
Using wikipedia to boost collaborative filtering
ing intervals according to the number of ratings they have.
techniques. In RecSys, pages 285–288, 2011.
Then for each movie m in a rating interval we calculate the
Jaccard similarity value between the movie m and its rat- [4] Y. Koren, R. M. Bell, and C. Volinsky. Matrix
ing neighborhood of 100 movies then calculate the averages factorization techniques for recommender systems.
per rating interval. Figure 2 shows these average values as IEEE Computer, 42(8):30–37, 2009.
a function of rating intervals. Here, we again have an in- [5] A. Loizou and S. Dasmahapatra. Using Wikipedia to
teresting case. There is a negative linear correlation which alleviate data sparsity issues in Recommender Systems,
means that the more a movie has ratings the more its rating pages 104–111. IEEE, 2010.
similarity diverges from its genre similarity. [6] P. Melville, R. J. Mooney, and R. Nagarajan.
The reason underlying these results might be this. Movies Content-boosted collaborative filtering for improved
which have limited number of ratings (unpopular movies) recommendations. In AAAI/IAAI, pages 187–192, 2002.
are generally watched by the fans of that genre. For exam- [7] J. B. Schafer, D. Frankowski, J. L. Herlocker, and
ple, a fan of sci-fi movies may also watch an unpopular sci- S. Sen. Collaborative filtering recommender systems. In
fi movie. So, unpopular movies generally get ratings from The Adaptive Web, pages 291–324, 2007.
the same set of users who are fans of that movie’s genre. [8] M. Zhang and N. Hurley. Avoiding monotony:
And this makes the rating vectors of those movies (same improving the diversity of recommendation lists. In
genre movies) similar to each other. On the other hand if RecSys, pages 123–130, 2008.
a movie is popular than it gets ratings from a diverse set of
users which causes their rating neighborhoods diverge from
its genre.