<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>New Features for Sentiment Analysis: Do Sentences Matter?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gizem Gezici</string-name>
          <email>gizemgezici@sabanciuniv.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Berrin Yanikoglu</string-name>
          <email>berrin@sabanciuniv.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dilek Tapucu</string-name>
          <email>dilektapucu@sabanciuniv.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yu¨cel Saygın</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Engineering, Izmir Institute of Technology</institution>
          ,
          <addr-line>Izmir</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Engineering and Natural Sciences, Sabancı University</institution>
          ,
          <addr-line>Istanbul</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
      </contrib-group>
      <fpage>5</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>In this work, we propose and evaluate new features to be used in a word polarity based approach to sentiment classification. In particular, we analyze sentences as the first step before estimating the overall review polarity. We consider different aspects of sentences, such as length, purity, irrealis content, subjectivity, and position within the opinionated text. This analysis is then used to find sentences that may convey better information about the overall review polarity. The TripAdvisor dataset is used to evaluate the effect of sentence level features on polarity classification. Our initial results indicate a small improvement in classification accuracy when using the newly proposed features. However, the benefit of these features is not limited to improving sentiment classification accuracy since sentence level features can be used for other important tasks such as review summarization.</p>
      </abstract>
      <kwd-group>
        <kwd>sentiment analysis</kwd>
        <kwd>sentiment classification</kwd>
        <kwd>polarity detection</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Sentiment analysis aims to extract the opinions indicated in textual data
enabling us to understand what people think about specific issues by analyzing
large collections of textual data sources such as personal blogs, review sites, and
social media. An important part of sentiment analysis boils down to a
classification problem, i.e., given an opinionated text, classifying it as positive or negative
polarity and Machine Learning techniques have already been adopted to solve
this problem.</p>
      <p>
        Two main approaches for sentiment analysis are lexicon-based and
supervised methods. The lexicon-based approach calculates the semantic orientation
of words in a review by obtaining word polarities from a lexicon such as the
SentiWordNet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. While the SentiWordNet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is a domain-independent lexicon, one
can use a domain-specific lexicon whenever available since domain-specific
lexicons better indicate the word polarities in that domain (e.g. the word ”small”
has a positive connotation in cell phone domain; while it is negative in hotel
domain).
      </p>
      <p>
        Supervised learning approaches use machine learning techniques to establish
a model from a large corpus of reviews. The set of sample reviews form the
training data from which the model is built. For instance in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] , researchers
use the Naive Bayes algorithm to separate positive reviews from negative ones
by learning the probability distributions of the considered features in the two
classes. While supervised approaches are typically more successful, collecting a
large training data is often a problem.
      </p>
      <p>Word-level polarities provide a simple yet effective method for estimating
a review’s polarity, however, the gap from word-level polarities to review-level
polarity is too big. To bridge this gap, we propose to analyze word-polarities
within sentences, as an intermediate step.</p>
      <p>
        The idea of sentence level analysis is not new. Some researchers approached
the problem by first finding subjective sentences in a review, with the hope of
eliminating irrelevant sentences that would generate noise in terms of polarity
estimation [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Yet another approach is to exploit the structure in
sentences, rather than seeing a review as a bag of words [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. For instance
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], conjunctions were analyzed to obtain the polarities of the words that are
connected with the conjunct. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] researchers focused on sentence
polarities separately, again to obtain sentence polarities more correctly, with the goal
of improving review polarity in turn. The first line polarity has also been used
as a feature by [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>
        Similar to [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], this work is motivated by our observation that the first and
last lines of a review are often very indicative of the review polarity. Starting from
this simple observation, we formulated more sophisticated features for sentence
level sentiment analysis. In order to do that, we performed an in-depth analysis
of different sentence types. For instance, in addition to subjective sentences, we
defined pure, short, and no irrealis sentences.
      </p>
      <p>We performed a preliminary evaluation using the TripAdvisor dataset to see
the effect of sentence level features on polarity classification. Throughout the
evaluation, we observed a small improvement in classification accuracy due to
the newly proposed features. Our initial results showed that the sentences do
matter and they need to be explored in larger and more diverse datasets such
as blogs. Moreover, the benefit of these features is not limited to improving
sentiment classification accuracy. In fact, sentence level features can be used
to identify the essential sentences in the review which could further be used in
review summarization.</p>
      <p>Our paper is organized as follows: Section 2 presents our taxonomy of
sentiment analysis features, together with the newly proposed features. Section 3
describes the sentence level analysis for defining the features. Section 4 describes
the tools and methodology for sentiment classification together with the
experimental results and error analysis. Finally, in Section 5 we draw some conclusions
and propose future extension of this work.</p>
    </sec>
    <sec id="sec-2">
      <title>Taxonomy and Formulation of the New Features</title>
      <p>We define an extensive set of 19 features that can be grouped in four categories:
(1) basic features, (2) features based on subjective sentence occurrence statistics,
(3) delta-tf-idf weighting of word polarities, and (4) sentence-level features. These
features are listed in Table 1 and using the notations given below and some basic
definitions provided in Table 2, they are defined formally in Tables 3-7.</p>
      <p>A review R is a sequence of sentences R = S1S2S3...SM where M is the
number of sentences in R. Each sentence Si in turn is a sequence of words, such
that Si = wi1wi2...wiN(i) where N (i) is the number of words in Si. The review R
can also be viewed as a sequence of words w1..wT , where T is the total number
of words in the review.</p>
      <p>
        In Table 2, subjective words (SBJ) are defined as all the words in
SentiWordNet that has a dominant negative or positive polarity. A word has dominant
positive and negative polarity if the sum of its positive and negative polarity values
is greater than 0.5 [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. SubjW (R) is defined as the most frequent subjective
words in SBJ (at most 20 of them) that appear in review R. For a sentence
Si ∈ R, the average sentence polarity is used to determine subjectivity of that
sentence. If it is above a threshold, we consider the sentence as subjective,
forming subjS(R). Similarly, a sentence Si is pure if its purity is greater than a fixed
threshold τ . We experimented with different values of τ and for evaluation we
used τ = 0.8. These two sets form the subS(R) and pure(R) sets respectively.
We also looked at the effect of first and last sentences in the review, as well as
sentences containing irrealis words. In order to determine irrealis sentences, the
existence of the modal verbs ’would’, ’could’, or ’should’ is checked. If one of
these modal verbs appear in the sentence then these sentences are labeled as
irrealis similar to [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
M the total number of sentences in R
T the total number of words in R
SBJ set of known subjective words
subjW (R) set of most frequent subjective words from SBJ, in R (max 20)
subjS(R) set of subjective sentences in R
pure(R) set of pure sentences in R
nonIr(R) set of non-irrealis sentences in R
For our baseline system, we use the average word polarity and purity defined in
Table 3. As mentioned before, these features are commonly used in word
polarity based sentiment analysis. In our formulation pol(wj ) denotes the dominant
polarity of wj of R, as obtained from SentiWordNet, and |pol(wj )| denotes the
absolute polarity of wj .
The features in this group are derived through the analysis of subjective words
that frequently occur in the review. For instance, the average polarity of the most
frequent subjective words (feature F4) aims to capture the frequent sentiment
in the review, without the noise coming from all subjective words.
      </p>
      <p>
        The features were defined before in some previous work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; however, to the
best of our knowledge, they considered all words, not specifically subjective
words.
We compute the Δtf ∗idf scores of the words in SentiWordNet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] from a training
corpus in the given domain, in order to capture domain specificity [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. For a
word wi, Δtf ∗idf (wi) is defined as Δtf ∗idf (wi) = tf ∗idf (wi, +)−tf ∗idf (wi, −).
      </p>
      <p>If it is positive, it indicates that a word is more associated with the positive
class and vice versa, if negative. We computed these scores on the training set
which is balanced in the number of positive and negative reviews.</p>
      <p>
        Then, we sum up the Δtf ∗ idf scores of these words (feature F6). By
doing this, our goal is to capture the difference in distribution of these words,
among positive and negative reviews. The aim is to obtain context-dependent
scores that may replace the polarities coming from SentiWordNet which is a
context-independent lexicon [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. With the help of context-dependent
information provided by Δtf ∗ idf related features, we expect to better differentiate the
positive reviews from negative ones.
      </p>
      <p>
        We also tried another feature by combining the two information, where we
weighted the polarities of all words in the review by their Δtf ∗idf scores (feature
F7).
We have two features related to punctuation. These two features were suggested
in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and since we have seen that they could be useful for some cases we included
them in our sentiment classification system.
      </p>
      <sec id="sec-2-1">
        <title>Sentence Level Features</title>
        <p>
          Sentence level features are extracted from some specific types of sentences that
are identified through a sentence level analysis of the corpus. For instance the
first and last lines polarity/purity are features that depend on sentence position;
while average polarity of words in subjective/pure etc. sentences are new features
that consider only subjective or pure sentences respectively.
We tried three different approaches in obtaining the review polarity. In the first
approach, each review is pruned to keep only the sentences that are possibly
more useful for sentiment analysis. For pruning, thresholds were set separately
for each sentence level feature. Sentences with length of at most 12 words are
accepted as short and sentences with absolute purity of at least 0.8 are defined
as pure sentences. For subjectivity of the sentences, we adopted the same idea
that was mentioned in [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] and applied it on not words, but sentences in this
case.
        </p>
        <p>Pruning sentences in this way resulted in lower accuracy in general, due
to loss of information. Thus, in the second approach, the polarities in special
sentences (pure, subjective, short or no irrealis) were given higher weights while
computing the average word polarity. In effect, other sentences were given lower
weight, rather than the more severe pruning.</p>
        <p>In the final approach that gave the best results, we used the information
extracted from sentence level analysis as features used for training our system.</p>
        <p>We believe that our main contribution is the introduction and evaluation of
sentence-level features; yet other than these, some well-known and commonly
used features are integrated to our system, as explained in the next section.</p>
        <p>
          Our approach depends on the existence of a sentiment lexicon that provide
information about the semantic orientation of single or multiple terms.
Specifically, we use the SentiWordNet [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] where for each term at a specific function,
its positive, negative or neutral appraisal strength is indicated (e.g. ”good,ADJ,
0.5)
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Implementation and Experimental Evaluation</title>
      <p>
        In this section, we provide an evaluation of the sentiment analysis features based
on word polarities. We use the dominant polarity for each word (the largest
polarity among negative, objective or positive categories) obtained from
sentiWordNet. We evaluate the newly proposed features and compare their performance to
a baseline system. Our baseline system uses two basic features which are the
average polarity and purity of the review. These features are previously suggested
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] widely used in word polarity-based sentiment analysis. They are
defined in Table 3 for completeness. The evaluation procedure we used in our
experiments is described in the following subsections.
4.1
      </p>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>
          We evaluated the performance of our system on a sentimental dataset,
TripAdvisor that was introduced by [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and, [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] respectively. The TripAdvisor corpus
consists of around 250.000 customer-supplied reviews of 1850 hotels. Each
review is associated with a hotel and a star-rating, 1-star (most negative) to 5-star
(most positive), chosen by the customer to indicate his evaluation.
        </p>
        <p>We evaluated the performance of our approach on a randomly chosen dataset
from TripAdvisor corpus. Our dataset consists of 3000 positive and 3000 negative
reviews. After we have chosen 6000 reviews randomly, these reviews were shuffled
and split into three groups as train, validation and test sets. Each of these
datasets have 1000 positive and 1000 negative reviews.</p>
        <p>
          We computed our features and gave labels to our instances (reviews)
according to the customer-given ratings of reviews. If the rating of a review is bigger
than 2 then it is labeled as positive, and otherwise as negative. These
intermediate files were generated with a Java code on Eclipse and given to WEKA [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]
for binary classification.
4.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Sentiment Classification</title>
        <p>
          Initially, we tried several classifiers that are known to work well for
classification purposes. Then, according to their performances we decided to use Support
Vector Machines (SVM) and Logistic regression. SVMs are known for being able
to handle large feature spaces while simultaneously limiting overfitting, while
Logistic Regression is a simple, and commonly used, well-performing classifier.
The SVM is trained using a radial basis function kernel as provided by
LibSVM [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. For LibSVM, RBF kernel worked better in comparison to other kernels
on our dataset. Afterwards, we performed grid-search on validation dataset for
parameter optimization.
4.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Experimental Results</title>
        <p>
          In order to evaluate our sentiment classification system, we used binary
classification with two classifiers, namely SVMs and Logistic Regression. The reviews
with star rating bigger than 2 are positive reviews and the rest are negative
reviews in our case, since we focused on binary classification of reviews. Apart
from this, we also looked at the importance of the features. The importance of
the features will be stated with the feature ranking property of WEKA [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] as
well as the gradual accuracy increase, as we add a new feature to the existing
subset of features.
        </p>
        <p>For these results, we used grid search on validation set. Then, by these
optimum parameters, we trained our system on training set and tested it on testing
set.</p>
        <p>The results for the best performing feature combinations described in Table 1,
are given in Table 8. As can be seen in this table, using sentence level features
bring improvements over the best results, albeit small.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Discussion</title>
        <p>
          As can be seen in the experiments section, our system with the newly proposed
features obtains one of the best results obtained so far, except for [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Although
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] obtains the best result on a large TripAdvisor dataset, its main drawback is
that topic models learned by methods such as LDA requires re-training when
a new topic comes. In contrast, our system uses word polarities; therefore it is
very simple and fast. For this reason, it is more fair to compare our system with
similar systems in the literature.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>In this work, we tried to bridge the gap between word-level polarities and
reviewlevel polarity through an intermediate step of sentence level analysis of the
reviews. We formulated new features for sentence level sentiment analysis by an
in-depth analysis of the sentences. We implemented the proposed features and
evaluated them on the TripAdvisor dataset to see the effect of sentence level
features on polarity classification. We observed that the sentence level features
have an effect on sentiment classification, and therefore, we may conclude that
sentences do matter in sentiment analysis and they need to be explored for larger
and more diverse datasets such as blogs. For future work, we will evaluate each
feature set both in isolation and in groups, and work on improving the
accuracy. Furthermore, we will switch to a regression problem for estimating the star
rating of reviews.</p>
      <p>Sentence level features have other uses since they can be exploited further to
identify the essential sentences in the review. We plan to incorporate sentence
level features for highlighting the important sentences and review summarization
in our open source sentiment analysis system SARE which may be accessed
through http://ferrari.sabanciuniv.edu/sare.</p>
      <p>Acknowledgements. This work was partially funded by European
Commission, FP7, under UBIPOL (Ubiquitous Participation Platform for Policy
Making) Project (www.ubipol.eu).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsinchun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums</article-title>
          .
          <source>ACM Transactions on Information Systems 26</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bespalov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shokoufandeh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Sentiment classification based on supervised latent n-gram analysis</article-title>
          .
          <source>In: ACM Conference on Information and Knowledge Management (CIKM)</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <issue>3</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>Libsvm: a library for support vector machines (</article-title>
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Denecke</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>How to assess customer opinions beyond language barriers</article-title>
          ? In: ICDIM. pp.
          <fpage>430</fpage>
          -
          <lpage>435</lpage>
          . IEEE (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Esuli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Sentiwordnet: A publicly available lexical resource for opinion mining</article-title>
          .
          <source>In: In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC06</source>
          . pp.
          <fpage>417</fpage>
          -
          <lpage>422</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gindl</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weichselbraun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scharl</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Cross-domain contextualization of sentiment lexicons</article-title>
          .
          <source>Media</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Grbner</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fliedl</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuchs</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Classification of customer reviews based on sentiment analysis</article-title>
          .
          <source>Social Sciences</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hatzivassiloglou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mckeown</surname>
            ,
            <given-names>K.R.</given-names>
          </string-name>
          :
          <article-title>Predicting the semantic orientation of adjectives</article-title>
          .
          <source>In: Proceedings of ACL-97</source>
          , 35th
          <article-title>Annual Meeting of the Association for Computational Linguistics</article-title>
          . pp.
          <fpage>174</fpage>
          -
          <lpage>181</lpage>
          . Association for Computational Linguistics (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.m.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Automatic detection of opinion bearing words</article-title>
          and sentences pp.
          <fpage>61</fpage>
          -
          <lpage>66</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>R.Y.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bruza</surname>
            ,
            <given-names>P.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>K.F.</given-names>
          </string-name>
          :
          <article-title>Leveraging web 2.0 data for scalable semi-supervised learning of domain-specific sentiment lexicons</article-title>
          .
          <source>In: Proceedings of the 20th ACM international conference on Information and knowledge management</source>
          . pp.
          <fpage>2457</fpage>
          -
          <lpage>2460</lpage>
          . CIKM '11,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lebanon</surname>
          </string-name>
          , G.:
          <article-title>Isotonic conditional random fields and local sentiment flow</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Martineau</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Delta tfidf: An improved feature space for sentiment analysis</article-title>
          . In: Adar,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Hurst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Finin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Glance</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.S.</given-names>
            ,
            <surname>Nicolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Tseng</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.L</surname>
          </string-name>
          . (eds.) ICWSM. The AAAI Press (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mcdonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hannan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neylon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wells</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reynar</surname>
          </string-name>
          , J.:
          <article-title>Structured models for fine-to-coarse sentiment analysis</article-title>
          .
          <source>Computational Linguistics</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Meena</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prabhakar</surname>
            ,
            <given-names>T.V.</given-names>
          </string-name>
          :
          <article-title>Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis</article-title>
          .
          <source>Symposium A Quarterly Journal In Modern Foreign Literatures (2)</source>
          ,
          <fpage>573</fpage>
          -
          <lpage>580</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A sentimental education : Sentiment analysis using subjectivity summarization based on minimum cuts</article-title>
          . Cornell University Library (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaithyanathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Thumbs up? sentiment classification using machine learning techniques</article-title>
          .
          <source>In: Proceedings of EMNLP</source>
          . pp.
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Taboada</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brooke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tofiloski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voll</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stede</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Lexicon-based methods for sentiment analysis</article-title>
          .
          <source>Comput. Linguist</source>
          .
          <volume>37</volume>
          (
          <issue>2</issue>
          ),
          <fpage>267</fpage>
          -
          <lpage>307</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <article-title>The TripAdvisor website</article-title>
          . http://www.tripadvisor.com (
          <year>2011</year>
          ), [TripAdvisor LLC]
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Latent aspect rating analysis on review text data: A rating regression approach</article-title>
          .
          <source>Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data</source>
          mining pp.
          <fpage>783</fpage>
          -
          <lpage>792</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Data Mining: Practical Machine Learning Tools and Techniques</article-title>
          . Morgan Kaufmann (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Yu</surname>
          </string-name>
          , H.:
          <article-title>Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences</article-title>
          .
          <source>Proceeding EMNLP 03 Proceedings of the 2003 conference on Empirical methods in natural language processing</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Grouping product features using semi-supervised learning with soft-constraints</article-title>
          . In: Huang,
          <string-name>
            <given-names>C.R.</given-names>
            ,
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.)
          <source>COLING</source>
          <year>2010</year>
          , 23rd International Conference on Computational Linguistics,
          <source>Proceedings of the Conference</source>
          ,
          <volume>23</volume>
          -27
          <source>August</source>
          <year>2010</year>
          , Beijing, China. pp.
          <fpage>1272</fpage>
          -
          <lpage>1280</lpage>
          . Tsinghua University Press (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Zhang, Y.:
          <article-title>Ucsc on rec 2006 blog opinion mining</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          :
          <article-title>Adding redundant features for crfs-based sentence sentiment classification</article-title>
          .
          <source>In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>117</fpage>
          -
          <lpage>126</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>