<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ACM RecSys</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Using Social Data for Personalizing Review Rankings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vaishak Suresh*, Syeda Roohi*, Magdalini Eirinaki</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iraklis Varlamis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Engineering Department, San Jose State University</institution>
          ,
          <addr-line>CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Informatics and Telematics, Harokopio University of Athens</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>10</volume>
      <abstract>
        <p>Almost all users look at online ratings and reviews before buying a product, visiting a business, or using a service. These reviews are independent, authored by other users, and thus may convey useful information to the end user. Reviews usually have an overall rating, but most of the times there are sub-texts in the review body that describe certain features/aspects of the product. The majority of web sites rank these reviews either by date, or by overall “helpfulness”. However, different users look for different qualities in a product/business/service. In this work, we try to address this problem by proposing a system that creates personalized rankings of these reviews, tailored to each individual user. We discuss how social data, ratings, and reviews can be combined to create this personalized experience. We present our work-in-progress using the Yelp Challenge dataset and discuss some first findings regarding implementation and scalability.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Personalization</kwd>
        <kwd>recommendation sentiment analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In this work, we present a system framework that addresses the
above issue. In a nutshell, we create user profiles that reflect each
user’s preferences for specific restaurants and restaurant qualities
(e.g. food, ambience, etc.). The profiles are created using the
rating data as well as implicit preference as identified by applying
aspect-based opinion mining to the reviews. Using these profiles,
we identify similar users and rank their reviews for new
restaurants higher. We also integrate the social network of the
user, identifying those friends who have similar preference
patterns with the active user, and highlight their reviews.</p>
      <p>Therefore, for the same restaurant, two different users will see a
different list of reviews. The system is accompanied by a
userfriendly interface that also highlights the main aspects of each
review such that the user does not have to read the full text. To
achieve this, we employ aspect-based opinion mining and
neighborhood-based collaborative filtering techniques and
integrate them in our system.</p>
      <p>
        We also present a system prototype, built using the Yelp dataset1
to demonstrate a first approach to this interesting problem [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Without loss of generality, we focus on restaurant review
recommendations, however our approach is easily extended to any
other product/business/service as long as reviews, ratings, and an
underlying social network are available. The personalized
presentation of the reviews is a subjective matter and therefore
very hard to evaluate without involving real users, however we
provide some first empirical results. We should stress that this is a
work in progress and our focus in this paper is to introduce this
mash-up idea along with an initial approach to the problem, as
well as our thoughts on how such a system could be further
enhanced.
      </p>
      <p>The rest of the paper is organized as follows: we present our
system’s design in detail in Section 2. We provide a first-cut
approach on extending the proposed model using social network
connections and feedback in Section 3. Some discussion on the
prototype implementation and evaluation is included in Section 4.
An overview of the related work is provided in Section 5 and we
conclude with our plans for future work in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. SYSTEM DESIGN</title>
      <p>The system architecture is shown in Figure 1. It comprises two
main modules, an offline processing module, where the user
profiles are being generated and the feature extraction and rating
happens, as well as an online module, that generates real-time
recommendations.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Offline Processing</title>
      <p>There are two phases of offline processing: namely aspect
summarization and user preference generation.</p>
      <sec id="sec-3-1">
        <title>2.1.1 Aspect Summarization</title>
        <p>
          This module aims at extracting the important features from each
review, along with their polarity weight. To perform this we
employ the subjectivity lexicon [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] in order to map weak and
strong positive and negative words to numeric values (ranging
from -4 to +4). Using a master list of positive and negative
opinion words from an opinion lexicon [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] we created a list of
negation words (not, no, nothing etc.) which inverse the
sentiment, and intensifiers (too, very, so, etc.), which increase the
intensity of the sentiment (these are referred as “TOO words” in
our algorithm). More specifically, words of each review are
tagged using the default POS (parts of speech) tagger from
* Vaishak Suresh and Syeda Roohi are currently affiliated with Intuit Inc. and HP
respectively.
1 http://www.yelp.com/dataset_challenge/
NLTK2, a natural language processing Python package. This is
done using the Treebank corpus. The text augmented with tags is
then split into sentences and then into words. Each word is then
examined to determine its type.
If the word is POS-tagged as an adverb or an adjective, it is
considered as an opinion word. If the opinion word is POS-tagged
as superlative or comparative the score is set to the maximum (+4)
or minimum (-4) based on the polarity. During this process, the
words that modify the polarity (e.g. “not”) and degree (e.g. “too”,
“very”) are also considered for scoring the opinion word. The
presence of these words can inverse or increase the sentiment
score of the aspect respectively. The words POS-tagged as nouns
are potential candidates to be the feature words. Apart from using
the pre-defined feature look up file, these words are also tested to
find any synonyms using the WordNet interface in NLTK.
Once the features and opinion words in a sentence are determined,
a mapping is made to a feature and opinion word based on the
distance between them. The aggregated opinion score for each
feature is calculated for all the sentences in the review as
mentioned above and the review document is updated with these
values in the system’s database. The algorithm performing this
process is outlined in Figure 2.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.1.2 User profile generation</title>
        <p>In order to generate personalized review rankings, we follow a
neighborhood-based collaborative filtering approach. Given a user
 and the set of businesses they have rated and/or reviewed ,
each user is represented by a profile vector  =   , … , 
where  denotes the preference of a user for a business i and k is
the total number of businesses in the system. We define  as
follows:</p>
        <p>! =   0!    "                          ∈∉!!                                              (1)
where !" represents the cumulative preference score of a user u
for a business i, calculated using their overall rating or opinion on
specific aspects of the business that are identified by their review
for it. We introduce three alternative ways of calculating the
2 http://www.nltk.org
For each sentence in the review
For each word in the sentence
if the POS of the word is Adverb (RB) and is a TOO word</p>
        <p>Save the TOO word position
if the word is 'and'</p>
        <p>Continue the too rule
# opinon word
if the word is in the subjectivity lexicon or in the master list
if the POS tag is a superlative adverb (RBS) or adjective (JJS)</p>
        <p>Set the superlative flag
if the POS tag is a comparative adverb (RBR) or adjective (JJR)</p>
        <p>Set the comparative flag
if word in subjectivity lexicon</p>
        <p>Set the word score
else if word in positive master list</p>
        <p>Set the word score to +1
else if word in negative master list</p>
        <p>Set the word score to -1
if too exists and is adjacent
if word is positive</p>
        <p>increase the score by 1
if word is negative</p>
        <p>decrease the score by 1
if superlative flag Set
if word is positive</p>
        <p>Set the score to +4
if word is negative</p>
        <p>Set the score to -4
if comparative flag Set
if word is positive</p>
        <p>Set the score to +3
if word is negative</p>
        <p>Set the score to -3
if the opinion word is in negative context</p>
        <p>negate the sentiment of the score
Save the opinion_word_position and score
# Feature word
if the POS of the word is a Noun (NN)
if the word is in feature list or a synonym of a feature</p>
        <p>Save the feature and position
Apply the opinion score to the potential feature in the sentence
Aggregate the score for each feature
preference score, namely using only the rating of the user, using
the specific review opinion scores, or weighing them by the
overall preference/dislike of the user for each aspect, as shown in
Equations 2, 3 and 4 respectively:</p>
        <sec id="sec-3-2-1">
          <title>Rating-based preference score</title>
          <p>!" = !"                                                                                                        (2)
where !" denotes the star rating of user u for business i.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Business-based preference score</title>
          <p>!" =  </p>
          <p>!"                                                                                 (3)
where !" denotes the set of aspects included in the review of user
u for business i and !" is the opinion score calculated for aspect
a in this particular review.</p>
          <p>Review-based preference score
!" =  </p>
          <p>!". !"                                                                 (4)
where !" denotes the overall preference/dislike of user u for
aspect a, as expressed by their opinions on all reviews they’ve
written. This can be calculated as the normalized sum of all the
scores !" in all the reviews !.</p>
          <p>!∈!!"
!∈!!"
Once the user profiles are created, we employ a user-based
collaborative filtering technique to find similar users. In our
implementation, we have used the Pearson correlation coefficient
and the open source libraries provided by Apache Mahout.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>2.2 Online Recommendations</title>
      <p>This step is used to rank and recommend reviews in real-time, as
the user navigates the system and searches for new restaurants.
When a given user searches for a specific restaurant, the
recommendation engine computes the similarity of the current
user with all the reviewers of the particular business and ranks and
presents the related reviews in descending order of similarity. As
a result, each user will be presented with a different set of reviews
for the same business.</p>
      <p>Moreover, the interface allows the end user to get the gist of the
reviews without the need to read the entire review text. For each
review, the overall star rating as well as the most important
aspects of each review, are prominently shown. The aspects are
intuitively marked as strong/weak positive/negative, by using
colors and thumbs up/down images. We should stress that the
same aspect might appear in more than one reviews and one
review might contain more than one aspects.</p>
    </sec>
    <sec id="sec-5">
      <title>3. SOCIAL NETWORK FEEDBACK</title>
      <p>When available, information related to the user’s social network
can be incorporated in our model. There are two alternative ways
this can be done, either at the last step of the process, or during the
profile generation.</p>
      <p>In the first case, the similarity between the user and their friends is
calculated when the user searches for the restaurant. The friends’
reviews for this restaurant are separately ranked and presented in a
different list so that they are easily identifiable.</p>
      <p>In the second case, the user preferences are weighed by the user’s
friends’ opinion scores. To incorporate the social network
feedback in the model, we extend Equation 1 as follows:
! =  
!!!. !"                 ∈ !
!!!                               ∉ !
                                         (5)
where ! is the set of friends of user u and !!! can be defined as
follows:
!!! =
!∈!! !"</p>
      <p>                                                                                    (6)
Equation 6 can be easily extended to incorporate the similarities
between users.</p>
      <p>Note that in this extension, we also address the cold-start problem
since the user profile can be filled by social network feedback,
even when the user has few, or none reviews/ratings in the
system.</p>
    </sec>
    <sec id="sec-6">
      <title>4. PROTOTYPE EVALUATION</title>
      <p>
        We have already implemented a prototype based on our system
design described in the previous sections using the Yelp dataset.
Our prototype implements the business-based preference profile,
assuming that the product aspects are predetermined. A screenshot
of our prototype is shown in Figure 5. Each review is
accompanied by some metrics showing the calculated polarity and
subjectivity of the review as well as the similarity of each
reviewer to the user. The end user may further refine the
personalized list of reviews by filtering only those that come from
his/her friends or by feature (e.g. location, food, etc.). More
technical details on the implementation are included in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]3.
We have load-tested the prototype, deployed on Tomcat Server on
a machine with the following configuration: Intel i5-2410M CPU
@2.30 GHz, 64-bit OS with 4 GB RAM. As shown in Figure 3,
the response time increases linearly with the number of users and
can handle multiple simultaneous requests in real-time (the system
crashed after 175 simultaneous requests, as MongoDB can’t
handle that many connections).
      </p>
      <p>)  15000  
s
(m 10000  
e
m
iT  5000  
e
s
npo 0  
s
e
R
25   35   45   50   60   70   75   85   95   100  125  150  175  </p>
      <p>Number  of  users  
  1  
n
iso0.8  
i
c
re0.6  
eP 0.4  
g
ra0.2  
e
vA 0  
We also performed an empirical evaluation of the
recommendations using the following methodology: we randomly
picked 50 users and generated top-5 recommendations for a
specific restaurant. We then asked human evaluators to rate each
recommended review on the following scale: 1 = “irrelevant”, 2 =
“somewhat relevant”, 3 = “very relevant”. To assign the rankings,
the evaluators were asked to identify 2-4 aspects highlighted in
each user’s review4. If the recommended review included &gt;50%
of the aspects, then it received a 3, if it was very uninformative or
did not include any aspects it received an 1 and everything else
received a 2. We employ precision as our evaluation metric and
define Prec3 and Prec2, measuring how many recommendations
received a “3” or a “2 or 3” rating respectively.</p>
      <p>Prec3   Prec2  </p>
      <p>All   1/2  reviews   3+  reviews  
Users  (based  on  total  number  of  reviews)  
We observe that the system manages to recommend 60% or more
very relevant recommendations, while the accuracy reaches to
100% when the somewhat relevant recommendations are
included. The accuracy increases more when the “cold-start” users
(i.e. users with only 1 or 2 reviews contributing to 48% of the
subset) are removed. We noticed that most of the times the system
failed to generate useful recommendations was when the style of
the review was sarcastic and/or focused on non-trivial issues (e.g.
servers engaged on a fight). Moreover, as the aspects currently
used are very high-level, the results did not capture specific food
3 A screencast of the prototype is available at: http://youtu.be/vMz5CobpIw4
4 The aspects were not identical to the ones used by our prototype. Instead the
evaluators were asked to identify anything that stood out (e.g. user favors short
reviews, values price/service/food, etc.)
preferences of the users (e.g. vegan vs. meat lover). On the other
hand, the algorithm has been quite successful in identifying
priorities such as the atmosphere/service quality/drink options etc.
As a reference, the number of individual user reviews for this
subset ranged from 1 to 36 (mean = 4.7, median = 3).</p>
    </sec>
    <sec id="sec-7">
      <title>5. RELATED WORK</title>
      <p>
        Many interesting works exist that focus on extracting the opinions
from the customer reviews [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The most recent ones employ
features as an additional tool in representing the semantic
orientation of a review [
        <xref ref-type="bibr" rid="ref1 ref2 ref4">1, 2, 4</xref>
        ]. This is an important line of work
that provides very useful input in the creation of the rich user
profiles of our system. The algorithm we introduce in this paper is
along the same lines, however we should note that any similar
approach could be easily integrated in our system.
      </p>
      <p>
        None of the major web sites that include reviews as an
indispensable part of their business provide aspect-oriented
personalized review rankings. For instance, Amazon ranks
reviews by helpfulness (number of “helpful” votes received)
without providing any summary of the reviews, other than the
overall star rating. Netflix’s rating system is also mainly based on
the star ratings, whereas Google shopping allows users to create a
list of pros and cons in addition to the review, but ranks them
based on the review date. Finally, Yelp, whose dataset we are
employing in this study, ranks reviews by helpfulness. It also
provides an overall summary for each business in terms of several
aspects (e.g. friendly for kids, romantic, etc.), as well as a short
summary of the most common comments in the reviews. The last
two companies have some underlying social network that is not,
however, utilized in re-ranking or personalizing the reviews.
Similarly, not much work has been done in the research
community. The problem of using helpfulness as a way to rank
results is discussed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The authors conclude that for
experience goods, users prefer a brief description of the
“objective” elements of the item and then a subjective positioning,
described by aspects not captured by the product description. Our
work not only addresses these findings, but also proposes ways of
personalizing the rankings for each user, taking into consideration
their social network as well. Helpfulness is also used in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as a
way to filter out interesting reviews. This work addresses the
same problem in a somewhat different way. The authors employ
the feedback given by the community in terms of how helpful
one's reviews are, along with several other content-, social-, and
sentiment-based features in order to classify a review as helpful or
not. The main differences with our approach are that the sentiment
is based on explicit sub-ratings given by the users to several
predetermined aspects of a service as well as the fact that
theauthors assume that a “helpfulness” vote exists for each review
in the dataset.
      </p>
    </sec>
    <sec id="sec-8">
      <title>6. CONCLUSIONS</title>
      <p>The amount of online reviews for products and services has grown
to such extend that often makes it impossible to read all of them.
In this work we propose a system that personalizes the order in
which the reviews are shown and provides an intuitive interface
that allows the users to see the important aspects of each review in
a glimpse. An initial evaluation shows promising results. As part
of our future work we plan to integrate further these two types of
recommendations and enhance them by introducing trust-based
and reputation metrics. We also plan to perform a more extensive
evaluation of the usefulness of such reordering.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>A holistic lexicon-based approach to opinion mining</article-title>
          ,
          <source>in Proc. of WSDM '08</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eirinaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pisal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Feature-based Opinion Mining and Ranking</article-title>
          ,
          <source>J. of Computer and System Sciences (JCSS)</source>
          ,
          <volume>78</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>1175</fpage>
          -
          <lpage>1184</lpage>
          ,
          <year>July 2012</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ipeirotis</surname>
          </string-name>
          ,
          <article-title>Designing Novel Review Ranking Systems:Predicting the Usefulness and Impact of Reviews</article-title>
          ,
          <source>in Proc. of ICEC '07</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <article-title>Address standardization with latent semantic association</article-title>
          ,
          <source>in Proc. of ACM KDD'09</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Sentiment Analysis</article-title>
          and
          <string-name>
            <given-names>Opinion</given-names>
            <surname>Mining</surname>
          </string-name>
          , Morgan &amp; Claypool Publishers, May 2012
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>M.P.O'Mahony</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>A classification-based review recommender</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          ,
          <volume>23</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>323</fpage>
          -
          <lpage>329</lpage>
          , May 2010
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Roohi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Suresh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eirinaki</surname>
          </string-name>
          ,
          <article-title>Aspect based Opinion Mining and Recommendation System for Restaurant Reviews, demo paper</article-title>
          ,
          <source>in Proc. of ACM RecSys 2014</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wiebe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Hoffmann</surname>
          </string-name>
          ,
          <article-title>Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis</article-title>
          .
          <source>In Proc. of HLT-EMNLP-2005.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>