=Paper=
{{Paper
|id=None
|storemode=property
|title=Using Social Data for Personalizing Review Rankings
|pdfUrl=https://ceur-ws.org/Vol-1271/Paper5.pdf
|volume=Vol-1271
|dblpUrl=https://dblp.org/rec/conf/recsys/SureshREV14
}}
==Using Social Data for Personalizing Review Rankings==
<pdf width="1500px">https://ceur-ws.org/Vol-1271/Paper5.pdf</pdf>
<pre>
      Using Social Data for Personalizing Review Rankings
Vaishak Suresh*, Syeda Roohi*, Magdalini Eirinaki                                              Iraklis Varlamis
                Computer Engineering Department                               Department of Informatics and Telematics
                San Jose State University, CA, USA                             Harokopio University of Athens, Greece


ABSTRACT                                                                Therefore, for the same restaurant, two different users will see a
Almost all users look at online ratings and reviews before buying       different list of reviews. The system is accompanied by a user-
a product, visiting a business, or using a service. These reviews       friendly interface that also highlights the main aspects of each
are independent, authored by other users, and thus may convey           review such that the user does not have to read the full text. To
useful information to the end user. Reviews usually have an             achieve this, we employ aspect-based opinion mining and
overall rating, but most of the times there are sub-texts in the        neighborhood-based collaborative filtering techniques and
review body that describe certain features/aspects of the product.      integrate them in our system.
The majority of web sites rank these reviews either by date, or by      We also present a system prototype, built using the Yelp dataset1
overall “helpfulness”. However, different users look for different      to demonstrate a first approach to this interesting problem [7].
qualities in a product/business/service. In this work, we try to        Without loss of generality, we focus on restaurant review
address this problem by proposing a system that creates                 recommendations, however our approach is easily extended to any
personalized rankings of these reviews, tailored to each individual     other product/business/service as long as reviews, ratings, and an
user. We discuss how social data, ratings, and reviews can be           underlying social network are available. The personalized
combined to create this personalized experience. We present our         presentation of the reviews is a subjective matter and therefore
work-in-progress using the Yelp Challenge dataset and discuss           very hard to evaluate without involving real users, however we
some first findings regarding implementation and scalability.           provide some first empirical results. We should stress that this is a
                                                                        work in progress and our focus in this paper is to introduce this
Keywords                                                                mash-up idea along with an initial approach to the problem, as
Personalization; recommendation        Engine;    feature   ranking;    well as our thoughts on how such a system could be further
sentiment analysis;                                                     enhanced.
1. INTRODUCTION                                                         The rest of the paper is organized as follows: we present our
With more and more businesses selling their products or                 system’s design in detail in Section 2. We provide a first-cut
advertising their services online, customers rely on word of mouth      approach on extending the proposed model using social network
in the form of customer reviews to make their decisions. Most of        connections and feedback in Section 3. Some discussion on the
the current websites that feature product/business/service reviews      prototype implementation and evaluation is included in Section 4.
list these reviews in reverse chronological order, or by employing      An overview of the related work is provided in Section 5 and we
heuristic metrics (e.g. ranking higher reviews of “super users”, i.e.   conclude with our plans for future work in Section 6.
users with many reviews, or those with the most “helpful” votes).
However, such a generic ranking requires from users to read or at       2. SYSTEM DESIGN
least scan the tens or hundreds of reviews for one                      The system architecture is shown in Figure 1. It comprises two
product/business/service.                                               main modules, an offline processing module, where the user
                                                                        profiles are being generated and the feature extraction and rating
Moreover, different people value different aspects of the same          happens, as well as an online module, that generates real-time
product/business/service. For example, when searching for a             recommendations.
digital camera, one might be interested in the price and size,
whereas another user may value the ease of use. Similarly, when         2.1 Offline Processing
searching for a good Italian restaurant, one user might value the       There are two phases of offline processing: namely aspect
ambience and wine list of a place, while another might prefer           summarization and user preference generation.
restaurants that are family-friendly. Ideally, users would like to be
presented with only these reviews that highlight the qualities of a     2.1.1 Aspect Summarization
product/service that they value.                                        This module aims at extracting the important features from each
                                                                        review, along with their polarity weight. To perform this we
In this work, we present a system framework that addresses the
                                                                        employ the subjectivity lexicon [8] in order to map weak and
above issue. In a nutshell, we create user profiles that reflect each
                                                                        strong positive and negative words to numeric values (ranging
user’s preferences for specific restaurants and restaurant qualities
                                                                        from -4 to +4). Using a master list of positive and negative
(e.g. food, ambience, etc.). The profiles are created using the
                                                                        opinion words from an opinion lexicon [5] we created a list of
rating data as well as implicit preference as identified by applying
                                                                        negation words (not, no, nothing etc.) which inverse the
aspect-based opinion mining to the reviews. Using these profiles,
                                                                        sentiment, and intensifiers (too, very, so, etc.), which increase the
we identify similar users and rank their reviews for new
                                                                        intensity of the sentiment (these are referred as “TOO words” in
restaurants higher. We also integrate the social network of the
                                                                        our algorithm). More specifically, words of each review are
user, identifying those friends who have similar preference
                                                                        tagged using the default POS (parts of speech) tagger from
patterns with the active user, and highlight their reviews.
Proceedings of the 6th Workshop on Recommender Systems and the Social
Web (RSWeb 2014), collocated with ACM RecSys 2014, 10/06/2014,          * Vaishak Suresh and Syeda Roohi are currently affiliated with Intuit Inc. and HP
                                                                           respectively.
Foster City, CA, USA. Copyright held by the authors.                    1
                                                                          http://www.yelp.com/dataset_challenge/
NLTK2, a natural language processing Python package. This is
                                                                                                                    For each sentence in the review
done using the Treebank corpus. The text augmented with tags is                                                      For each word in the sentence
then split into sentences and then into words. Each word is then                                                      if the POS of the word is Adverb (RB) and is a TOO word
examined to determine its type.                                                                                             Save the TOO word position
                                                                                                                      if the word is 'and'
                                                                                                                            Continue the too rule
                                                                                                                      # opinon word
                                                                                                                      if the word is in the subjectivity lexicon or in the master list
                                                                                                                            if the POS tag is a superlative adverb (RBS) or adjective (JJS)
                                                                                                                                                Set the superlative flag
                                                                                                                            if the POS tag is a comparative adverb (RBR) or adjective (JJR)
                                                                                                                                                Set the comparative flag
                                                                                                                            if word in subjectivity lexicon
                                                                                                                                                Set the word score
                                                                                                                            else if word in positive master list
                                                                                                                                                Set the word score to +1
                                                                                                                            else if word in negative master list
                                                                                                                                                Set the word score to -1
                                                                                                                            if too exists and is adjacent
                                                                                                                                                if word is positive
                                                                                                                                                              increase the score by 1
                                                                                                                                                if word is negative
                                                                                                                                                              decrease the score by 1
                                                                                                                           if superlative flag Set
                                                                                                                                                if word is positive
                                                                                                                                                              Set the score to +4
                                                                                                                                                if word is negative
                          Figure 1: System Architecture                                                                                                       Set the score to -4
                                                                                                                           if comparative flag Set
If the word is POS-tagged as an adverb or an adjective, it is                                                                                   if word is positive
considered as an opinion word. If the opinion word is POS-tagged                                                                                              Set the score to +3
as superlative or comparative the score is set to the maximum (+4)                                                                              if word is negative
                                                                                                                                                              Set the score to -3
or minimum (-4) based on the polarity. During this process, the                                                            if the opinion word is in negative context
words that modify the polarity (e.g. “not”) and degree (e.g. “too”,                                                                             negate the sentiment of the score
“very”) are also considered for scoring the opinion word. The                                                         Save the opinion_word_position and score
                                                                                                                      # Feature word
presence of these words can inverse or increase the sentiment                                                         if the POS of the word is a Noun (NN)
score of the aspect respectively. The words POS-tagged as nouns                                                            if the word is in feature list or a synonym of a feature
are potential candidates to be the feature words. Apart from using                                                                              Save the feature and position
the pre-defined feature look up file, these words are also tested to                                                 Apply the opinion score to the potential feature in the sentence
                                                                                                                    Aggregate the score for each feature
find any synonyms using the WordNet interface in NLTK.
                                                                                                                             Figure 2: Opinion score assignment algorithm
Once the features and opinion words in a sentence are determined,
a mapping is made to a feature and opinion word based on the                                                        preference score, namely using only the rating of the user, using
distance between them. The aggregated opinion score for each                                                        the specific review opinion scores, or weighing them by the
feature is calculated for all the sentences in the review as                                                        overall preference/dislike of the user for each aspect, as shown in
mentioned above and the review document is updated with these                                                       Equations 2, 3 and 4 respectively:
values in the system’s database. The algorithm performing this
process is outlined in Figure 2.                                                                                    Rating-based preference score
                                                                                                                                 𝑠!" = 𝑟!"                                                                                                         (2)
2.1.2 User profile generation
In order to generate personalized review rankings, we follow a                                                      where 𝑟!" denotes the star rating of user u for business i.
neighborhood-based collaborative filtering approach. Given a user
𝐮 and the set of businesses they have rated and/or reviewed 𝐁𝐮 ,                                                    Business-based preference score
each user is represented by a profile vector 𝐔 =    𝐩𝟏 , … , 𝐩𝐤
                                                                                                                                 𝑠!" =                             𝑜!"                                                                                 (3)
where 𝐩𝐢 denotes the preference of a user for a business i and k is
                                                                                                                                                   !∈!!"
the total number of businesses in the system. We define 𝐩𝐢 as
follows:                                                                                                            where 𝑅!" denotes the set of aspects included in the review of user
                            𝑠!"         𝑖𝑓        𝑖 ∈ 𝐵!                                                            u for business i and 𝑜!" is the opinion score calculated for aspect
                  𝑝! =                                                                                        (1)   a in this particular review.
                            0              𝑖𝑓          𝑖   ∉ 𝐵!
                                                                                                                    Review-based preference score

where 𝑠!" represents the cumulative preference score of a user u                                                                 𝑠!" =                            𝑤!" . 𝑜!"                                                                 (4)
for a business i, calculated using their overall rating or opinion on                                                                             !∈!!"
specific aspects of the business that are identified by their review
for it. We introduce three alternative ways of calculating the                                                      where 𝑤!" denotes the overall preference/dislike of user u for
                                                                                                                    aspect a, as expressed by their opinions on all reviews they’ve
                                                                                                                    written. This can be calculated as the normalized sum of all the
                                                                                                                    scores 𝑜!" in all the reviews 𝑅! .

2
    http://www.nltk.org
Once the user profiles are created, we employ a user-based                                                                              his/her friends or by feature (e.g. location, food, etc.). More
collaborative filtering technique to find similar users. In our                                                                         technical details on the implementation are included in [7]3.
implementation, we have used the Pearson correlation coefficient
and the open source libraries provided by Apache Mahout.                                                                                We have load-tested the prototype, deployed on Tomcat Server on
                                                                                                                                        a machine with the following configuration: Intel i5-2410M CPU
2.2 Online Recommendations                                                                                                              @2.30 GHz, 64-bit OS with 4 GB RAM. As shown in Figure 3,
This step is used to rank and recommend reviews in real-time, as                                                                        the response time increases linearly with the number of users and
the user navigates the system and searches for new restaurants.                                                                         can handle multiple simultaneous requests in real-time (the system
When a given user searches for a specific restaurant, the                                                                               crashed after 175 simultaneous requests, as MongoDB can’t
recommendation engine computes the similarity of the current                                                                            handle that many connections).
user with all the reviewers of the particular business and ranks and
presents the related reviews in descending order of similarity. As                                                                                                                                     15000	
  

                                                                                                                                                                      Response	
  Time	
  (ms)	
  
a result, each user will be presented with a different set of reviews
for the same business.                                                                                                                                                                                 10000	
  

Moreover, the interface allows the end user to get the gist of the                                                                                                                                       5000	
  
reviews without the need to read the entire review text. For each                                                                                                                                             0	
  
review, the overall star rating as well as the most important                                                                                                                                                           25	
   35	
   45	
   50	
   60	
   70	
   75	
   85	
   95	
   100	
  125	
  150	
  175	
  
aspects of each review, are prominently shown. The aspects are                                                                                                                                                                                        Number	
  of	
  users	
  
intuitively marked as strong/weak positive/negative, by using
colors and thumbs up/down images. We should stress that the
same aspect might appear in more than one reviews and one                                                                                     Figure 3: Response Time per number of concurrent users
review might contain more than one aspects.                                                                                             We also performed an empirical evaluation of the
                                                                                                                                        recommendations using the following methodology: we randomly
3. SOCIAL NETWORK FEEDBACK                                                                                                              picked 50 users and generated top-5 recommendations for a
When available, information related to the user’s social network                                                                        specific restaurant. We then asked human evaluators to rate each
can be incorporated in our model. There are two alternative ways                                                                        recommended review on the following scale: 1 = “irrelevant”, 2 =
this can be done, either at the last step of the process, or during the                                                                 “somewhat relevant”, 3 = “very relevant”. To assign the rankings,
profile generation.                                                                                                                     the evaluators were asked to identify 2-4 aspects highlighted in
In the first case, the similarity between the user and their friends is                                                                 each user’s review4. If the recommended review included >50%
calculated when the user searches for the restaurant. The friends’                                                                      of the aspects, then it received a 3, if it was very uninformative or
reviews for this restaurant are separately ranked and presented in a                                                                    did not include any aspects it received an 1 and everything else
different list so that they are easily identifiable.                                                                                    received a 2. We employ precision as our evaluation metric and
                                                                                                                                        define Prec3 and Prec2, measuring how many recommendations
In the second case, the user preferences are weighed by the user’s                                                                      received a “3” or a “2 or 3” rating respectively.
friends’ opinion scores. To incorporate the social network
feedback in the model, we extend Equation 1 as follows:                                                                                                                                                                                      Prec3	
          Prec2	
  
                                                                                                                                                                                                     1	
  
                                                                                                                                                Average	
  Precision	
  

                      𝑤!! ! . 𝑠!"         𝑖𝑓        𝑖 ∈ 𝐵!
            𝑝! =                                                                                             (5)                                                                   0.8	
  
                      𝑤!! !                     𝑖𝑓        𝑖   ∉ 𝐵!
                                                                                                                                                                                   0.6	
  
where 𝐹! is the set of friends of user u and 𝑤!! ! can be defined as                                                                                                               0.4	
  
follows:
                                                                                                                                                                                   0.2	
  
                           !∈!! 𝑠!"                                                                                                                                                                  0	
  
            𝑤!! ! =                                                                                                               (6)
                                𝑓                                                                                                                                                                                          All	
                     1/2	
  reviews	
                   3+	
  reviews	
  

Equation 6 can be easily extended to incorporate the similarities                                                                                                                                                     Users	
  (based	
  on	
  total	
  number	
  of	
  reviews)	
  
between users.
Note that in this extension, we also address the cold-start problem                                                                                                                                                      Figure 4: Average precision
since the user profile can be filled by social network feedback,                                                                        We observe that the system manages to recommend 60% or more
even when the user has few, or none reviews/ratings in the                                                                              very relevant recommendations, while the accuracy reaches to
system.                                                                                                                                 100% when the somewhat relevant recommendations are
                                                                                                                                        included. The accuracy increases more when the “cold-start” users
4. PROTOTYPE EVALUATION                                                                                                                 (i.e. users with only 1 or 2 reviews contributing to 48% of the
We have already implemented a prototype based on our system                                                                             subset) are removed. We noticed that most of the times the system
design described in the previous sections using the Yelp dataset.                                                                       failed to generate useful recommendations was when the style of
Our prototype implements the business-based preference profile,                                                                         the review was sarcastic and/or focused on non-trivial issues (e.g.
assuming that the product aspects are predetermined. A screenshot                                                                       servers engaged on a fight). Moreover, as the aspects currently
of our prototype is shown in Figure 5. Each review is                                                                                   used are very high-level, the results did not capture specific food
accompanied by some metrics showing the calculated polarity and
subjectivity of the review as well as the similarity of each
reviewer to the user. The end user may further refine the                                                                               3
                                                                                                                                            A screencast of the prototype is available at: http://youtu.be/vMz5CobpIw4
personalized list of reviews by filtering only those that come from                                                                     4
                                                                                                                                            The aspects were not identical to the ones used by our prototype. Instead the
                                                                                                                                            evaluators were asked to identify anything that stood out (e.g. user favors short
                                                                                                                                            reviews, values price/service/food, etc.)
                                    Figure 5. Client application – personalized recommended reviews
preferences of the users (e.g. vegan vs. meat lover). On the other     one's reviews are, along with several other content-, social-, and
hand, the algorithm has been quite successful in identifying           sentiment-based features in order to classify a review as helpful or
priorities such as the atmosphere/service quality/drink options etc.   not. The main differences with our approach are that the sentiment
As a reference, the number of individual user reviews for this         is based on explicit sub-ratings given by the users to several
subset ranged from 1 to 36 (mean = 4.7, median = 3).                   predetermined aspects of a service as well as the fact that
                                                                       theauthors assume that a “helpfulness” vote exists for each review
5. RELATED WORK                                                        in the dataset.
Many interesting works exist that focus on extracting the opinions
from the customer reviews [5]. The most recent ones employ             6. CONCLUSIONS
features as an additional tool in representing the semantic            The amount of online reviews for products and services has grown
orientation of a review [1, 2, 4]. This is an important line of work   to such extend that often makes it impossible to read all of them.
that provides very useful input in the creation of the rich user       In this work we propose a system that personalizes the order in
profiles of our system. The algorithm we introduce in this paper is    which the reviews are shown and provides an intuitive interface
along the same lines, however we should note that any similar          that allows the users to see the important aspects of each review in
approach could be easily integrated in our system.                     a glimpse. An initial evaluation shows promising results. As part
                                                                       of our future work we plan to integrate further these two types of
None of the major web sites that include reviews as an                 recommendations and enhance them by introducing trust-based
indispensable part of their business provide aspect-oriented           and reputation metrics. We also plan to perform a more extensive
personalized review rankings. For instance, Amazon ranks               evaluation of the usefulness of such reordering.
reviews by helpfulness (number of “helpful” votes received)
without providing any summary of the reviews, other than the           7. REFERENCES
overall star rating. Netflix’s rating system is also mainly based on   [1] X. Ding, B. Liu, P. S. Yu, A holistic lexicon-based approach
the star ratings, whereas Google shopping allows users to create a     to opinion mining, in Proc. of WSDM '08
list of pros and cons in addition to the review, but ranks them        [2] M. Eirinaki, S. Pisal, J. Singh, Feature-based Opinion Mining
based on the review date. Finally, Yelp, whose dataset we are          and Ranking, J. of Computer and System Sciences (JCSS), 78(4),
employing in this study, ranks reviews by helpfulness. It also         pp.1175-1184, July 2012
provides an overall summary for each business in terms of several      [3] A. Ghose, P. Ipeirotis, Designing Novel Review Ranking
aspects (e.g. friendly for kids, romantic, etc.), as well as a short   Systems:Predicting the Usefulness and Impact of Reviews, in
summary of the most common comments in the reviews. The last           Proc. of ICEC ‘07
two companies have some underlying social network that is not,         [4] H. Guo, H. Zhu, Z. Guo, X. Zhang, Z. Su, Address
however, utilized in re-ranking or personalizing the reviews.          standardization with latent semantic association, in Proc. of ACM
Similarly, not much work has been done in the research                 KDD’09
community. The problem of using helpfulness as a way to rank           [5] B. Liu, Sentiment Analysis and Opinion Mining, Morgan &
results is discussed in [3]. The authors conclude that for             Claypool Publishers, May 2012
experience goods, users prefer a brief description of the              [6] M.P.O’Mahony, B. Smith, A classification-based review
“objective” elements of the item and then a subjective positioning,    recommender, Knowledge-Based Systems, 23(4), pp. 323-329,
described by aspects not captured by the product description. Our      May 2010
work not only addresses these findings, but also proposes ways of      [7] S. Roohi, V. Suresh, M. Eirinaki, Aspect based Opinion
personalizing the rankings for each user, taking into consideration    Mining and Recommendation System for Restaurant Reviews,
their social network as well. Helpfulness is also used in [6] as a     demo paper, in Proc. of ACM RecSys 2014
way to filter out interesting reviews. This work addresses the         [8] T. Wilson, J. Wiebe, and P. Hoffmann, Recognizing
same problem in a somewhat different way. The authors employ           Contextual Polarity in Phrase-Level Sentiment Analysis. In Proc.
the feedback given by the community in terms of how helpful            of HLT-EMNLP-2005.

</pre>