Using Social Data for Personalizing Review Rankings Vaishak Suresh*, Syeda Roohi*, Magdalini Eirinaki Iraklis Varlamis Computer Engineering Department Department of Informatics and Telematics San Jose State University, CA, USA Harokopio University of Athens, Greece ABSTRACT Therefore, for the same restaurant, two different users will see a Almost all users look at online ratings and reviews before buying different list of reviews. The system is accompanied by a user- a product, visiting a business, or using a service. These reviews friendly interface that also highlights the main aspects of each are independent, authored by other users, and thus may convey review such that the user does not have to read the full text. To useful information to the end user. Reviews usually have an achieve this, we employ aspect-based opinion mining and overall rating, but most of the times there are sub-texts in the neighborhood-based collaborative filtering techniques and review body that describe certain features/aspects of the product. integrate them in our system. The majority of web sites rank these reviews either by date, or by We also present a system prototype, built using the Yelp dataset1 overall “helpfulness”. However, different users look for different to demonstrate a first approach to this interesting problem [7]. qualities in a product/business/service. In this work, we try to Without loss of generality, we focus on restaurant review address this problem by proposing a system that creates recommendations, however our approach is easily extended to any personalized rankings of these reviews, tailored to each individual other product/business/service as long as reviews, ratings, and an user. We discuss how social data, ratings, and reviews can be underlying social network are available. The personalized combined to create this personalized experience. We present our presentation of the reviews is a subjective matter and therefore work-in-progress using the Yelp Challenge dataset and discuss very hard to evaluate without involving real users, however we some first findings regarding implementation and scalability. provide some first empirical results. We should stress that this is a work in progress and our focus in this paper is to introduce this Keywords mash-up idea along with an initial approach to the problem, as Personalization; recommendation Engine; feature ranking; well as our thoughts on how such a system could be further sentiment analysis; enhanced. 1. INTRODUCTION The rest of the paper is organized as follows: we present our With more and more businesses selling their products or system’s design in detail in Section 2. We provide a first-cut advertising their services online, customers rely on word of mouth approach on extending the proposed model using social network in the form of customer reviews to make their decisions. Most of connections and feedback in Section 3. Some discussion on the the current websites that feature product/business/service reviews prototype implementation and evaluation is included in Section 4. list these reviews in reverse chronological order, or by employing An overview of the related work is provided in Section 5 and we heuristic metrics (e.g. ranking higher reviews of “super users”, i.e. conclude with our plans for future work in Section 6. users with many reviews, or those with the most “helpful” votes). However, such a generic ranking requires from users to read or at 2. SYSTEM DESIGN least scan the tens or hundreds of reviews for one The system architecture is shown in Figure 1. It comprises two product/business/service. main modules, an offline processing module, where the user profiles are being generated and the feature extraction and rating Moreover, different people value different aspects of the same happens, as well as an online module, that generates real-time product/business/service. For example, when searching for a recommendations. digital camera, one might be interested in the price and size, whereas another user may value the ease of use. Similarly, when 2.1 Offline Processing searching for a good Italian restaurant, one user might value the There are two phases of offline processing: namely aspect ambience and wine list of a place, while another might prefer summarization and user preference generation. restaurants that are family-friendly. Ideally, users would like to be presented with only these reviews that highlight the qualities of a 2.1.1 Aspect Summarization product/service that they value. This module aims at extracting the important features from each review, along with their polarity weight. To perform this we In this work, we present a system framework that addresses the employ the subjectivity lexicon [8] in order to map weak and above issue. In a nutshell, we create user profiles that reflect each strong positive and negative words to numeric values (ranging user’s preferences for specific restaurants and restaurant qualities from -4 to +4). Using a master list of positive and negative (e.g. food, ambience, etc.). The profiles are created using the opinion words from an opinion lexicon [5] we created a list of rating data as well as implicit preference as identified by applying negation words (not, no, nothing etc.) which inverse the aspect-based opinion mining to the reviews. Using these profiles, sentiment, and intensifiers (too, very, so, etc.), which increase the we identify similar users and rank their reviews for new intensity of the sentiment (these are referred as “TOO words” in restaurants higher. We also integrate the social network of the our algorithm). More specifically, words of each review are user, identifying those friends who have similar preference tagged using the default POS (parts of speech) tagger from patterns with the active user, and highlight their reviews. Proceedings of the 6th Workshop on Recommender Systems and the Social Web (RSWeb 2014), collocated with ACM RecSys 2014, 10/06/2014, * Vaishak Suresh and Syeda Roohi are currently affiliated with Intuit Inc. and HP respectively. Foster City, CA, USA. Copyright held by the authors. 1 http://www.yelp.com/dataset_challenge/ NLTK2, a natural language processing Python package. This is For each sentence in the review done using the Treebank corpus. The text augmented with tags is For each word in the sentence then split into sentences and then into words. Each word is then if the POS of the word is Adverb (RB) and is a TOO word examined to determine its type. Save the TOO word position if the word is 'and' Continue the too rule # opinon word if the word is in the subjectivity lexicon or in the master list if the POS tag is a superlative adverb (RBS) or adjective (JJS) Set the superlative flag if the POS tag is a comparative adverb (RBR) or adjective (JJR) Set the comparative flag if word in subjectivity lexicon Set the word score else if word in positive master list Set the word score to +1 else if word in negative master list Set the word score to -1 if too exists and is adjacent if word is positive increase the score by 1 if word is negative decrease the score by 1 if superlative flag Set if word is positive Set the score to +4 if word is negative Figure 1: System Architecture Set the score to -4 if comparative flag Set If the word is POS-tagged as an adverb or an adjective, it is if word is positive considered as an opinion word. If the opinion word is POS-tagged Set the score to +3 as superlative or comparative the score is set to the maximum (+4) if word is negative Set the score to -3 or minimum (-4) based on the polarity. During this process, the if the opinion word is in negative context words that modify the polarity (e.g. “not”) and degree (e.g. “too”, negate the sentiment of the score “very”) are also considered for scoring the opinion word. The Save the opinion_word_position and score # Feature word presence of these words can inverse or increase the sentiment if the POS of the word is a Noun (NN) score of the aspect respectively. The words POS-tagged as nouns if the word is in feature list or a synonym of a feature are potential candidates to be the feature words. Apart from using Save the feature and position the pre-defined feature look up file, these words are also tested to Apply the opinion score to the potential feature in the sentence Aggregate the score for each feature find any synonyms using the WordNet interface in NLTK. Figure 2: Opinion score assignment algorithm Once the features and opinion words in a sentence are determined, a mapping is made to a feature and opinion word based on the preference score, namely using only the rating of the user, using distance between them. The aggregated opinion score for each the specific review opinion scores, or weighing them by the feature is calculated for all the sentences in the review as overall preference/dislike of the user for each aspect, as shown in mentioned above and the review document is updated with these Equations 2, 3 and 4 respectively: values in the system’s database. The algorithm performing this process is outlined in Figure 2. Rating-based preference score 𝑠!" = 𝑟!"                                                                                                        (2) 2.1.2 User profile generation In order to generate personalized review rankings, we follow a where 𝑟!" denotes the star rating of user u for business i. neighborhood-based collaborative filtering approach. Given a user 𝐮 and the set of businesses they have rated and/or reviewed 𝐁𝐮 , Business-based preference score each user is represented by a profile vector 𝐔 =   𝐩𝟏 , … , 𝐩𝐤 𝑠!" =   𝑜!"                                                                                (3) where 𝐩𝐢 denotes the preference of a user for a business i and k is !∈!!" the total number of businesses in the system. We define 𝐩𝐢 as follows: where 𝑅!" denotes the set of aspects included in the review of user 𝑠!"        𝑖𝑓        𝑖 ∈ 𝐵! u for business i and 𝑜!" is the opinion score calculated for aspect 𝑝! =                                                (1) a in this particular review. 0              𝑖𝑓          𝑖   ∉ 𝐵! Review-based preference score where 𝑠!" represents the cumulative preference score of a user u 𝑠!" =   𝑤!" . 𝑜!"                                                                (4) for a business i, calculated using their overall rating or opinion on !∈!!" specific aspects of the business that are identified by their review for it. We introduce three alternative ways of calculating the where 𝑤!" denotes the overall preference/dislike of user u for aspect a, as expressed by their opinions on all reviews they’ve written. This can be calculated as the normalized sum of all the scores 𝑜!" in all the reviews 𝑅! . 2 http://www.nltk.org Once the user profiles are created, we employ a user-based his/her friends or by feature (e.g. location, food, etc.). More collaborative filtering technique to find similar users. In our technical details on the implementation are included in [7]3. implementation, we have used the Pearson correlation coefficient and the open source libraries provided by Apache Mahout. We have load-tested the prototype, deployed on Tomcat Server on a machine with the following configuration: Intel i5-2410M CPU 2.2 Online Recommendations @2.30 GHz, 64-bit OS with 4 GB RAM. As shown in Figure 3, This step is used to rank and recommend reviews in real-time, as the response time increases linearly with the number of users and the user navigates the system and searches for new restaurants. can handle multiple simultaneous requests in real-time (the system When a given user searches for a specific restaurant, the crashed after 175 simultaneous requests, as MongoDB can’t recommendation engine computes the similarity of the current handle that many connections). user with all the reviewers of the particular business and ranks and presents the related reviews in descending order of similarity. As 15000   Response  Time  (ms)   a result, each user will be presented with a different set of reviews for the same business. 10000   Moreover, the interface allows the end user to get the gist of the 5000   reviews without the need to read the entire review text. For each 0   review, the overall star rating as well as the most important 25   35   45   50   60   70   75   85   95   100  125  150  175   aspects of each review, are prominently shown. The aspects are Number  of  users   intuitively marked as strong/weak positive/negative, by using colors and thumbs up/down images. We should stress that the same aspect might appear in more than one reviews and one Figure 3: Response Time per number of concurrent users review might contain more than one aspects. We also performed an empirical evaluation of the recommendations using the following methodology: we randomly 3. SOCIAL NETWORK FEEDBACK picked 50 users and generated top-5 recommendations for a When available, information related to the user’s social network specific restaurant. We then asked human evaluators to rate each can be incorporated in our model. There are two alternative ways recommended review on the following scale: 1 = “irrelevant”, 2 = this can be done, either at the last step of the process, or during the “somewhat relevant”, 3 = “very relevant”. To assign the rankings, profile generation. the evaluators were asked to identify 2-4 aspects highlighted in In the first case, the similarity between the user and their friends is each user’s review4. If the recommended review included >50% calculated when the user searches for the restaurant. The friends’ of the aspects, then it received a 3, if it was very uninformative or reviews for this restaurant are separately ranked and presented in a did not include any aspects it received an 1 and everything else different list so that they are easily identifiable. received a 2. We employ precision as our evaluation metric and define Prec3 and Prec2, measuring how many recommendations In the second case, the user preferences are weighed by the user’s received a “3” or a “2 or 3” rating respectively. friends’ opinion scores. To incorporate the social network feedback in the model, we extend Equation 1 as follows: Prec3   Prec2   1   Average  Precision   𝑤!! ! . 𝑠!"        𝑖𝑓        𝑖 ∈ 𝐵! 𝑝! =                                            (5) 0.8   𝑤!! !                    𝑖𝑓        𝑖   ∉ 𝐵! 0.6   where 𝐹! is the set of friends of user u and 𝑤!! ! can be defined as 0.4   follows: 0.2   !∈!! 𝑠!" 0   𝑤!! ! =                                                                                    (6) 𝑓 All   1/2  reviews   3+  reviews   Equation 6 can be easily extended to incorporate the similarities Users  (based  on  total  number  of  reviews)   between users. Note that in this extension, we also address the cold-start problem Figure 4: Average precision since the user profile can be filled by social network feedback, We observe that the system manages to recommend 60% or more even when the user has few, or none reviews/ratings in the very relevant recommendations, while the accuracy reaches to system. 100% when the somewhat relevant recommendations are included. The accuracy increases more when the “cold-start” users 4. PROTOTYPE EVALUATION (i.e. users with only 1 or 2 reviews contributing to 48% of the We have already implemented a prototype based on our system subset) are removed. We noticed that most of the times the system design described in the previous sections using the Yelp dataset. failed to generate useful recommendations was when the style of Our prototype implements the business-based preference profile, the review was sarcastic and/or focused on non-trivial issues (e.g. assuming that the product aspects are predetermined. A screenshot servers engaged on a fight). Moreover, as the aspects currently of our prototype is shown in Figure 5. Each review is used are very high-level, the results did not capture specific food accompanied by some metrics showing the calculated polarity and subjectivity of the review as well as the similarity of each reviewer to the user. The end user may further refine the 3 A screencast of the prototype is available at: http://youtu.be/vMz5CobpIw4 personalized list of reviews by filtering only those that come from 4 The aspects were not identical to the ones used by our prototype. Instead the evaluators were asked to identify anything that stood out (e.g. user favors short reviews, values price/service/food, etc.) Figure 5. Client application – personalized recommended reviews preferences of the users (e.g. vegan vs. meat lover). On the other one's reviews are, along with several other content-, social-, and hand, the algorithm has been quite successful in identifying sentiment-based features in order to classify a review as helpful or priorities such as the atmosphere/service quality/drink options etc. not. The main differences with our approach are that the sentiment As a reference, the number of individual user reviews for this is based on explicit sub-ratings given by the users to several subset ranged from 1 to 36 (mean = 4.7, median = 3). predetermined aspects of a service as well as the fact that theauthors assume that a “helpfulness” vote exists for each review 5. RELATED WORK in the dataset. Many interesting works exist that focus on extracting the opinions from the customer reviews [5]. The most recent ones employ 6. CONCLUSIONS features as an additional tool in representing the semantic The amount of online reviews for products and services has grown orientation of a review [1, 2, 4]. This is an important line of work to such extend that often makes it impossible to read all of them. that provides very useful input in the creation of the rich user In this work we propose a system that personalizes the order in profiles of our system. The algorithm we introduce in this paper is which the reviews are shown and provides an intuitive interface along the same lines, however we should note that any similar that allows the users to see the important aspects of each review in approach could be easily integrated in our system. a glimpse. An initial evaluation shows promising results. As part of our future work we plan to integrate further these two types of None of the major web sites that include reviews as an recommendations and enhance them by introducing trust-based indispensable part of their business provide aspect-oriented and reputation metrics. We also plan to perform a more extensive personalized review rankings. For instance, Amazon ranks evaluation of the usefulness of such reordering. reviews by helpfulness (number of “helpful” votes received) without providing any summary of the reviews, other than the 7. REFERENCES overall star rating. Netflix’s rating system is also mainly based on [1] X. Ding, B. Liu, P. S. Yu, A holistic lexicon-based approach the star ratings, whereas Google shopping allows users to create a to opinion mining, in Proc. of WSDM '08 list of pros and cons in addition to the review, but ranks them [2] M. Eirinaki, S. Pisal, J. Singh, Feature-based Opinion Mining based on the review date. Finally, Yelp, whose dataset we are and Ranking, J. of Computer and System Sciences (JCSS), 78(4), employing in this study, ranks reviews by helpfulness. It also pp.1175-1184, July 2012 provides an overall summary for each business in terms of several [3] A. Ghose, P. Ipeirotis, Designing Novel Review Ranking aspects (e.g. friendly for kids, romantic, etc.), as well as a short Systems:Predicting the Usefulness and Impact of Reviews, in summary of the most common comments in the reviews. The last Proc. of ICEC ‘07 two companies have some underlying social network that is not, [4] H. Guo, H. Zhu, Z. Guo, X. Zhang, Z. Su, Address however, utilized in re-ranking or personalizing the reviews. standardization with latent semantic association, in Proc. of ACM Similarly, not much work has been done in the research KDD’09 community. The problem of using helpfulness as a way to rank [5] B. Liu, Sentiment Analysis and Opinion Mining, Morgan & results is discussed in [3]. The authors conclude that for Claypool Publishers, May 2012 experience goods, users prefer a brief description of the [6] M.P.O’Mahony, B. Smith, A classification-based review “objective” elements of the item and then a subjective positioning, recommender, Knowledge-Based Systems, 23(4), pp. 323-329, described by aspects not captured by the product description. Our May 2010 work not only addresses these findings, but also proposes ways of [7] S. Roohi, V. Suresh, M. Eirinaki, Aspect based Opinion personalizing the rankings for each user, taking into consideration Mining and Recommendation System for Restaurant Reviews, their social network as well. Helpfulness is also used in [6] as a demo paper, in Proc. of ACM RecSys 2014 way to filter out interesting reviews. This work addresses the [8] T. Wilson, J. Wiebe, and P. Hoffmann, Recognizing same problem in a somewhat different way. The authors employ Contextual Polarity in Phrase-Level Sentiment Analysis. In Proc. the feedback given by the community in terms of how helpful of HLT-EMNLP-2005.