=Paper=
{{Paper
|id=None
|storemode=property
|title=Using Social Data for Personalizing Review Rankings
|pdfUrl=https://ceur-ws.org/Vol-1271/Paper5.pdf
|volume=Vol-1271
|dblpUrl=https://dblp.org/rec/conf/recsys/SureshREV14
}}
==Using Social Data for Personalizing Review Rankings==
Using Social Data for Personalizing Review Rankings
Vaishak Suresh*, Syeda Roohi*, Magdalini Eirinaki Iraklis Varlamis
Computer Engineering Department Department of Informatics and Telematics
San Jose State University, CA, USA Harokopio University of Athens, Greece
ABSTRACT Therefore, for the same restaurant, two different users will see a
Almost all users look at online ratings and reviews before buying different list of reviews. The system is accompanied by a user-
a product, visiting a business, or using a service. These reviews friendly interface that also highlights the main aspects of each
are independent, authored by other users, and thus may convey review such that the user does not have to read the full text. To
useful information to the end user. Reviews usually have an achieve this, we employ aspect-based opinion mining and
overall rating, but most of the times there are sub-texts in the neighborhood-based collaborative filtering techniques and
review body that describe certain features/aspects of the product. integrate them in our system.
The majority of web sites rank these reviews either by date, or by We also present a system prototype, built using the Yelp dataset1
overall “helpfulness”. However, different users look for different to demonstrate a first approach to this interesting problem [7].
qualities in a product/business/service. In this work, we try to Without loss of generality, we focus on restaurant review
address this problem by proposing a system that creates recommendations, however our approach is easily extended to any
personalized rankings of these reviews, tailored to each individual other product/business/service as long as reviews, ratings, and an
user. We discuss how social data, ratings, and reviews can be underlying social network are available. The personalized
combined to create this personalized experience. We present our presentation of the reviews is a subjective matter and therefore
work-in-progress using the Yelp Challenge dataset and discuss very hard to evaluate without involving real users, however we
some first findings regarding implementation and scalability. provide some first empirical results. We should stress that this is a
work in progress and our focus in this paper is to introduce this
Keywords mash-up idea along with an initial approach to the problem, as
Personalization; recommendation Engine; feature ranking; well as our thoughts on how such a system could be further
sentiment analysis; enhanced.
1. INTRODUCTION The rest of the paper is organized as follows: we present our
With more and more businesses selling their products or system’s design in detail in Section 2. We provide a first-cut
advertising their services online, customers rely on word of mouth approach on extending the proposed model using social network
in the form of customer reviews to make their decisions. Most of connections and feedback in Section 3. Some discussion on the
the current websites that feature product/business/service reviews prototype implementation and evaluation is included in Section 4.
list these reviews in reverse chronological order, or by employing An overview of the related work is provided in Section 5 and we
heuristic metrics (e.g. ranking higher reviews of “super users”, i.e. conclude with our plans for future work in Section 6.
users with many reviews, or those with the most “helpful” votes).
However, such a generic ranking requires from users to read or at 2. SYSTEM DESIGN
least scan the tens or hundreds of reviews for one The system architecture is shown in Figure 1. It comprises two
product/business/service. main modules, an offline processing module, where the user
profiles are being generated and the feature extraction and rating
Moreover, different people value different aspects of the same happens, as well as an online module, that generates real-time
product/business/service. For example, when searching for a recommendations.
digital camera, one might be interested in the price and size,
whereas another user may value the ease of use. Similarly, when 2.1 Offline Processing
searching for a good Italian restaurant, one user might value the There are two phases of offline processing: namely aspect
ambience and wine list of a place, while another might prefer summarization and user preference generation.
restaurants that are family-friendly. Ideally, users would like to be
presented with only these reviews that highlight the qualities of a 2.1.1 Aspect Summarization
product/service that they value. This module aims at extracting the important features from each
review, along with their polarity weight. To perform this we
In this work, we present a system framework that addresses the
employ the subjectivity lexicon [8] in order to map weak and
above issue. In a nutshell, we create user profiles that reflect each
strong positive and negative words to numeric values (ranging
user’s preferences for specific restaurants and restaurant qualities
from -4 to +4). Using a master list of positive and negative
(e.g. food, ambience, etc.). The profiles are created using the
opinion words from an opinion lexicon [5] we created a list of
rating data as well as implicit preference as identified by applying
negation words (not, no, nothing etc.) which inverse the
aspect-based opinion mining to the reviews. Using these profiles,
sentiment, and intensifiers (too, very, so, etc.), which increase the
we identify similar users and rank their reviews for new
intensity of the sentiment (these are referred as “TOO words” in
restaurants higher. We also integrate the social network of the
our algorithm). More specifically, words of each review are
user, identifying those friends who have similar preference
tagged using the default POS (parts of speech) tagger from
patterns with the active user, and highlight their reviews.
Proceedings of the 6th Workshop on Recommender Systems and the Social
Web (RSWeb 2014), collocated with ACM RecSys 2014, 10/06/2014, * Vaishak Suresh and Syeda Roohi are currently affiliated with Intuit Inc. and HP
respectively.
Foster City, CA, USA. Copyright held by the authors. 1
http://www.yelp.com/dataset_challenge/
NLTK2, a natural language processing Python package. This is
For each sentence in the review
done using the Treebank corpus. The text augmented with tags is For each word in the sentence
then split into sentences and then into words. Each word is then if the POS of the word is Adverb (RB) and is a TOO word
examined to determine its type. Save the TOO word position
if the word is 'and'
Continue the too rule
# opinon word
if the word is in the subjectivity lexicon or in the master list
if the POS tag is a superlative adverb (RBS) or adjective (JJS)
Set the superlative flag
if the POS tag is a comparative adverb (RBR) or adjective (JJR)
Set the comparative flag
if word in subjectivity lexicon
Set the word score
else if word in positive master list
Set the word score to +1
else if word in negative master list
Set the word score to -1
if too exists and is adjacent
if word is positive
increase the score by 1
if word is negative
decrease the score by 1
if superlative flag Set
if word is positive
Set the score to +4
if word is negative
Figure 1: System Architecture Set the score to -4
if comparative flag Set
If the word is POS-tagged as an adverb or an adjective, it is if word is positive
considered as an opinion word. If the opinion word is POS-tagged Set the score to +3
as superlative or comparative the score is set to the maximum (+4) if word is negative
Set the score to -3
or minimum (-4) based on the polarity. During this process, the if the opinion word is in negative context
words that modify the polarity (e.g. “not”) and degree (e.g. “too”, negate the sentiment of the score
“very”) are also considered for scoring the opinion word. The Save the opinion_word_position and score
# Feature word
presence of these words can inverse or increase the sentiment if the POS of the word is a Noun (NN)
score of the aspect respectively. The words POS-tagged as nouns if the word is in feature list or a synonym of a feature
are potential candidates to be the feature words. Apart from using Save the feature and position
the pre-defined feature look up file, these words are also tested to Apply the opinion score to the potential feature in the sentence
Aggregate the score for each feature
find any synonyms using the WordNet interface in NLTK.
Figure 2: Opinion score assignment algorithm
Once the features and opinion words in a sentence are determined,
a mapping is made to a feature and opinion word based on the preference score, namely using only the rating of the user, using
distance between them. The aggregated opinion score for each the specific review opinion scores, or weighing them by the
feature is calculated for all the sentences in the review as overall preference/dislike of the user for each aspect, as shown in
mentioned above and the review document is updated with these Equations 2, 3 and 4 respectively:
values in the system’s database. The algorithm performing this
process is outlined in Figure 2. Rating-based preference score
𝑠!" = 𝑟!" (2)
2.1.2 User profile generation
In order to generate personalized review rankings, we follow a where 𝑟!" denotes the star rating of user u for business i.
neighborhood-based collaborative filtering approach. Given a user
𝐮 and the set of businesses they have rated and/or reviewed 𝐁𝐮 , Business-based preference score
each user is represented by a profile vector 𝐔 = 𝐩𝟏 , … , 𝐩𝐤
𝑠!" = 𝑜!" (3)
where 𝐩𝐢 denotes the preference of a user for a business i and k is
!∈!!"
the total number of businesses in the system. We define 𝐩𝐢 as
follows: where 𝑅!" denotes the set of aspects included in the review of user
𝑠!" 𝑖𝑓 𝑖 ∈ 𝐵! u for business i and 𝑜!" is the opinion score calculated for aspect
𝑝! = (1) a in this particular review.
0 𝑖𝑓 𝑖 ∉ 𝐵!
Review-based preference score
where 𝑠!" represents the cumulative preference score of a user u 𝑠!" = 𝑤!" . 𝑜!" (4)
for a business i, calculated using their overall rating or opinion on !∈!!"
specific aspects of the business that are identified by their review
for it. We introduce three alternative ways of calculating the where 𝑤!" denotes the overall preference/dislike of user u for
aspect a, as expressed by their opinions on all reviews they’ve
written. This can be calculated as the normalized sum of all the
scores 𝑜!" in all the reviews 𝑅! .
2
http://www.nltk.org
Once the user profiles are created, we employ a user-based his/her friends or by feature (e.g. location, food, etc.). More
collaborative filtering technique to find similar users. In our technical details on the implementation are included in [7]3.
implementation, we have used the Pearson correlation coefficient
and the open source libraries provided by Apache Mahout. We have load-tested the prototype, deployed on Tomcat Server on
a machine with the following configuration: Intel i5-2410M CPU
2.2 Online Recommendations @2.30 GHz, 64-bit OS with 4 GB RAM. As shown in Figure 3,
This step is used to rank and recommend reviews in real-time, as the response time increases linearly with the number of users and
the user navigates the system and searches for new restaurants. can handle multiple simultaneous requests in real-time (the system
When a given user searches for a specific restaurant, the crashed after 175 simultaneous requests, as MongoDB can’t
recommendation engine computes the similarity of the current handle that many connections).
user with all the reviewers of the particular business and ranks and
presents the related reviews in descending order of similarity. As 15000
Response
Time
(ms)
a result, each user will be presented with a different set of reviews
for the same business. 10000
Moreover, the interface allows the end user to get the gist of the 5000
reviews without the need to read the entire review text. For each 0
review, the overall star rating as well as the most important 25
35
45
50
60
70
75
85
95
100
125
150
175
aspects of each review, are prominently shown. The aspects are Number
of
users
intuitively marked as strong/weak positive/negative, by using
colors and thumbs up/down images. We should stress that the
same aspect might appear in more than one reviews and one Figure 3: Response Time per number of concurrent users
review might contain more than one aspects. We also performed an empirical evaluation of the
recommendations using the following methodology: we randomly
3. SOCIAL NETWORK FEEDBACK picked 50 users and generated top-5 recommendations for a
When available, information related to the user’s social network specific restaurant. We then asked human evaluators to rate each
can be incorporated in our model. There are two alternative ways recommended review on the following scale: 1 = “irrelevant”, 2 =
this can be done, either at the last step of the process, or during the “somewhat relevant”, 3 = “very relevant”. To assign the rankings,
profile generation. the evaluators were asked to identify 2-4 aspects highlighted in
In the first case, the similarity between the user and their friends is each user’s review4. If the recommended review included >50%
calculated when the user searches for the restaurant. The friends’ of the aspects, then it received a 3, if it was very uninformative or
reviews for this restaurant are separately ranked and presented in a did not include any aspects it received an 1 and everything else
different list so that they are easily identifiable. received a 2. We employ precision as our evaluation metric and
define Prec3 and Prec2, measuring how many recommendations
In the second case, the user preferences are weighed by the user’s received a “3” or a “2 or 3” rating respectively.
friends’ opinion scores. To incorporate the social network
feedback in the model, we extend Equation 1 as follows: Prec3
Prec2
1
Average
Precision
𝑤!! ! . 𝑠!" 𝑖𝑓 𝑖 ∈ 𝐵!
𝑝! = (5) 0.8
𝑤!! ! 𝑖𝑓 𝑖 ∉ 𝐵!
0.6
where 𝐹! is the set of friends of user u and 𝑤!! ! can be defined as 0.4
follows:
0.2
!∈!! 𝑠!" 0
𝑤!! ! = (6)
𝑓 All
1/2
reviews
3+
reviews
Equation 6 can be easily extended to incorporate the similarities Users
(based
on
total
number
of
reviews)
between users.
Note that in this extension, we also address the cold-start problem Figure 4: Average precision
since the user profile can be filled by social network feedback, We observe that the system manages to recommend 60% or more
even when the user has few, or none reviews/ratings in the very relevant recommendations, while the accuracy reaches to
system. 100% when the somewhat relevant recommendations are
included. The accuracy increases more when the “cold-start” users
4. PROTOTYPE EVALUATION (i.e. users with only 1 or 2 reviews contributing to 48% of the
We have already implemented a prototype based on our system subset) are removed. We noticed that most of the times the system
design described in the previous sections using the Yelp dataset. failed to generate useful recommendations was when the style of
Our prototype implements the business-based preference profile, the review was sarcastic and/or focused on non-trivial issues (e.g.
assuming that the product aspects are predetermined. A screenshot servers engaged on a fight). Moreover, as the aspects currently
of our prototype is shown in Figure 5. Each review is used are very high-level, the results did not capture specific food
accompanied by some metrics showing the calculated polarity and
subjectivity of the review as well as the similarity of each
reviewer to the user. The end user may further refine the 3
A screencast of the prototype is available at: http://youtu.be/vMz5CobpIw4
personalized list of reviews by filtering only those that come from 4
The aspects were not identical to the ones used by our prototype. Instead the
evaluators were asked to identify anything that stood out (e.g. user favors short
reviews, values price/service/food, etc.)
Figure 5. Client application – personalized recommended reviews
preferences of the users (e.g. vegan vs. meat lover). On the other one's reviews are, along with several other content-, social-, and
hand, the algorithm has been quite successful in identifying sentiment-based features in order to classify a review as helpful or
priorities such as the atmosphere/service quality/drink options etc. not. The main differences with our approach are that the sentiment
As a reference, the number of individual user reviews for this is based on explicit sub-ratings given by the users to several
subset ranged from 1 to 36 (mean = 4.7, median = 3). predetermined aspects of a service as well as the fact that
theauthors assume that a “helpfulness” vote exists for each review
5. RELATED WORK in the dataset.
Many interesting works exist that focus on extracting the opinions
from the customer reviews [5]. The most recent ones employ 6. CONCLUSIONS
features as an additional tool in representing the semantic The amount of online reviews for products and services has grown
orientation of a review [1, 2, 4]. This is an important line of work to such extend that often makes it impossible to read all of them.
that provides very useful input in the creation of the rich user In this work we propose a system that personalizes the order in
profiles of our system. The algorithm we introduce in this paper is which the reviews are shown and provides an intuitive interface
along the same lines, however we should note that any similar that allows the users to see the important aspects of each review in
approach could be easily integrated in our system. a glimpse. An initial evaluation shows promising results. As part
of our future work we plan to integrate further these two types of
None of the major web sites that include reviews as an recommendations and enhance them by introducing trust-based
indispensable part of their business provide aspect-oriented and reputation metrics. We also plan to perform a more extensive
personalized review rankings. For instance, Amazon ranks evaluation of the usefulness of such reordering.
reviews by helpfulness (number of “helpful” votes received)
without providing any summary of the reviews, other than the 7. REFERENCES
overall star rating. Netflix’s rating system is also mainly based on [1] X. Ding, B. Liu, P. S. Yu, A holistic lexicon-based approach
the star ratings, whereas Google shopping allows users to create a to opinion mining, in Proc. of WSDM '08
list of pros and cons in addition to the review, but ranks them [2] M. Eirinaki, S. Pisal, J. Singh, Feature-based Opinion Mining
based on the review date. Finally, Yelp, whose dataset we are and Ranking, J. of Computer and System Sciences (JCSS), 78(4),
employing in this study, ranks reviews by helpfulness. It also pp.1175-1184, July 2012
provides an overall summary for each business in terms of several [3] A. Ghose, P. Ipeirotis, Designing Novel Review Ranking
aspects (e.g. friendly for kids, romantic, etc.), as well as a short Systems:Predicting the Usefulness and Impact of Reviews, in
summary of the most common comments in the reviews. The last Proc. of ICEC ‘07
two companies have some underlying social network that is not, [4] H. Guo, H. Zhu, Z. Guo, X. Zhang, Z. Su, Address
however, utilized in re-ranking or personalizing the reviews. standardization with latent semantic association, in Proc. of ACM
Similarly, not much work has been done in the research KDD’09
community. The problem of using helpfulness as a way to rank [5] B. Liu, Sentiment Analysis and Opinion Mining, Morgan &
results is discussed in [3]. The authors conclude that for Claypool Publishers, May 2012
experience goods, users prefer a brief description of the [6] M.P.O’Mahony, B. Smith, A classification-based review
“objective” elements of the item and then a subjective positioning, recommender, Knowledge-Based Systems, 23(4), pp. 323-329,
described by aspects not captured by the product description. Our May 2010
work not only addresses these findings, but also proposes ways of [7] S. Roohi, V. Suresh, M. Eirinaki, Aspect based Opinion
personalizing the rankings for each user, taking into consideration Mining and Recommendation System for Restaurant Reviews,
their social network as well. Helpfulness is also used in [6] as a demo paper, in Proc. of ACM RecSys 2014
way to filter out interesting reviews. This work addresses the [8] T. Wilson, J. Wiebe, and P. Hoffmann, Recognizing
same problem in a somewhat different way. The authors employ Contextual Polarity in Phrase-Level Sentiment Analysis. In Proc.
the feedback given by the community in terms of how helpful of HLT-EMNLP-2005.