<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Alleviating Data Sparsity for Twitter Sentiment Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hassan Saif</string-name>
          <email>h.saif@open.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yulan He</string-name>
          <email>y.he@open.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harith Alani</string-name>
          <email>h.alani@open.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Knowledge Media Institute, The Open University</institution>
          ,
          <addr-line>Milton Keynes, MK7 6AA</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifiers from tweets data often faces the data sparsity problem partly due to the large variety of short and irregular forms introduced to tweets because of the 140-character limit. In this work we propose using two different sets of features to alleviate the data sparseness problem. One is the semantic feature set where we extract semantically hidden concepts from tweets and then incorporate them into classifier training through interpolation. Another is the sentiment-topic feature set where we extract latent topics and the associated topic sentiment from tweets, then augment the original feature space with these sentiment-topics. Experimental results on the Stanford Twitter Sentiment Dataset show that both feature sets outperform the baseline model using unigrams only. Moreover, using semantic features rivals the previously reported best result. Using sentimenttopic features achieves 86.3% sentiment classification accuracy, which outperforms existing approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Microblogs</kwd>
        <kwd>Sentiment Analysis</kwd>
        <kwd>Opinion Mining</kwd>
        <kwd>Twitter</kwd>
        <kwd>Semantic Smoothing</kwd>
        <kwd>Data Sparsity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Few years after the explosion of Web 2.0, microblogs and social
networks are now considered as one of the most popular forms
of communication. Through platforms like Twitter and Facebook,
tons of information, which reflect people’s opinions and attitudes,
are published and shared among users everyday. Monitoring and
analysing opinions from social media provides enormous
opportunities for both public and private sectors. for private sectors, it has</p>
      <p>
        Twitter, which is considered now as one of the most popular
microblogging services, has attracted much attention recently as a hot
research topic in sentiment analysis. Previous work on twitter
sentiment analysis [
        <xref ref-type="bibr" rid="ref13 ref2 ref5">5, 13, 2</xref>
        ] rely on noisy labels or distant supervision,
for example, by taking emoticons as the indication of tweet
sentiment, to train supervised classifiers. Other work explore feature
engineering in combination of machine learning methods to
improve sentiment classification accuracy on tweets [
        <xref ref-type="bibr" rid="ref1 ref10">1, 10</xref>
        ]. None of
the work explicitly addressed the data sparsity problem which is
one of the major challenges facing when dealing with tweets data.
      </p>
      <p>Figure 1 compares the word frequency statistics of the tweets data
we used in our experiments and the movie review data1. X-axis
shows the word frequency interval, e.g., words occur up to 10 times
1http://www.cs.cornell.edu/People/pabo/
movie-review-data/
(1-10), more than 10 times but up to 20 times (10-20), etc. Y-axis
shows the percentage of words falls within certain word frequency
interval. It can be observed that the tweets data are sparser than the
movie review data since the former contain more infrequent words,
with 93% of the words in the tweets data occurring less than 10
times (cf. 78% in the movie review data).</p>
      <p>
        One possible way to alleviate data sparseness is through word
clustering such that words contributing similarly to sentiment
classification are grouped together. In this paper, we propose two
approaches to realise word clustering, one is through semantic
smoothing [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], the other is through automatic sentiment-topics
extraction. Semantic smoothing extracts semantically hidden concepts
from tweets and then incorporates them into supervised classifier
training by interpolation. An inspiring example for using
semantic smoothing is shown in Figure 2 where the left box lists entities
appeared in the training set together with their occurrence
probabilities in positive and negative tweets. For example, the entities
“ iPad”, “ iPod” and “ Mac Book Pro” appeared more often in tweets
of positive polarity and they are all mapped to the semantic concept
“ Product/Apple”. As a result, the tweet from the test set “ Finally,
I got my iPhone. What a product!” is more likely to have a
positive polarity because it contains the entity “ iPhone” which is also
mapped to the concept “ Product/Apple”.
      </p>
      <p>
        We propose a semantic interpolation method to incorporate
semantic concepts into sentiment classifier training where we interpolate
the original unigram language model in the Naïve Bayes (NB)
classifier with the generative model of words given semantic concepts.
We show on the Stanford Twitter Sentiment Data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that simply
replaces words with their corresponding semantic concepts reduces
the vocabulary size by nearly 20%. However, the sentiment
classification accuracy drops by 4% compared to the baseline NB model
trained on unigrams solely. With the interpolation method, the
sentiment classification accuracy improves upon the baseline model by
nearly 4%.
sequently added into the original feature space for supervised
classifier training. Our experimental results show that NB learned from
these features outperforms the baseline model trained on unigrams
only and achieves the state-of-the-art result on the original test set
of the Stanford Twitter Sentiment Data.
      </p>
      <p>The rest of the paper is organised as follows. Section 2 outlines
existing work on sentiment analysis with focus on twitter sentiment
analysis. Section 3 describes the data used in our experiments.
Section 4 presents our proposed semantic smoothing method. Section
5 describes how we incorporate sentiment-topics extracted from the
JST model into sentiment classifier training. Experimental results
are discussed in Section 6. Finally, we conclude our work and
outline future directions in Section 7.</p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>
        Much work has been done in the field of sentiment analysis. Most
of the work follows two basic approaches. The first approach
assumes that semantic orientation of a document is an averaged sum
of the semantic orientations of its words and phrases. The pioneer
work is the point-wise mutual information approach proposed in
Turney [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Also work such as [
        <xref ref-type="bibr" rid="ref16 ref19 ref6 ref8">6, 8, 19, 16</xref>
        ] are good examples
of this lexical-based approach. The second approach [
        <xref ref-type="bibr" rid="ref12 ref14 ref15 ref23 ref4">15, 14, 4,
23, 12</xref>
        ] addresses the problem as a text classification task where
classifiers are built using one of the machine learning methods and
trained on a dataset using features such as unigrams, bigrams,
partof-speech (POS) tags, etc. The vast majority of work in sentiment
analysis mainly focuses on the domains of movie reviews, product
reviews and blogs.
      </p>
      <p>Twitter sentiment analysis is considered as a much harder problem
than sentiment analysis on conventional text such as review
documents, mainly due to the short length of tweet messages, the
frequent use of informal and irregular words, and the rapid evolution
of language in Twitter. Annotated tweets data are impractical to
obtain. A large amount of work have been conducted on twitter
sentiment analysis using noisy labels (also called distant supervision).</p>
      <p>
        For example, Go et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] used emoticons such as “:-)” and “:(”
to label tweets as positive or negative and train standard classifiers
such as Naïve Bayes (NB), Maximum Entropy (MaxEnt), and
Support Vector Machines (SVMs) to detect the sentiments of tweets.
      </p>
      <p>
        The best result of 83% was reported by MaxEnt using a
combination of unigrams and bigrams. Barbosa and Feng [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] collected their
training data from three different Twitter sentiment detection
websites which mainly use some pre-built sentiment lexicons to label
each tweet as positive or negative. Using SVMs trained from these
noisy labeled data, they obtained 81.3% in sentiment classification
accuracy.
      </p>
      <p>
        While the aforementioned approaches did not detect neutral
sentiment, Pak and Paroubek [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] additionally collected neutral tweets
from Twitter accounts of various newspapers and magazines and
trained a three-class NB classifier which is able to detect neutral
tweets in addition to positive and negative tweets. Their NB was
trained with a combination of n-grams and POS features.
      </p>
      <p>
        Speriosu et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] argued that using noisy sentiment labels may
hinder the performance of sentiment classifiers. They proposed
exploiting the Twitter follower graph to improve sentiment
classificaOur second approach for automatic word clustering is through sentiment- tion and constructed a graph that has users, tweets, word unigrams,
topics extraction using the previously proposed joint sentiment- word bigrams, hashtags, and emoticons as its nodes which are
contopic (JST) model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The JST model extracts latent topics and nected based on the link existence among them (e.g., users are
conthe associated topic sentiment from the tweets data which are sub- nected to tweets they created; tweets are connected to word
uni#MSM2012
grams that they contain etc.). They then applied a label propagation
method where sentiment labels were propagated from a small set of
of nodes seeded with some initial label information throughout the
graph. They claimed that their label propagation method
outperforms MaxEnt trained from noisy labels and obtained an accuracy
of 84.7% on the subset of the twitter sentiment test set from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>There have also been some work in exploring feature engineering
to improve the performance of sentiment classification on tweets.</p>
      <p>
        Agarwal et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] studied using the feature based model and the
tree kernel based model for sentiment classification. They explored
a total of 50 different feature types and showed that both the
feature based and tree kernel based models perform similarly and they
outperform the unigram baseline.
      </p>
      <p>
        Kouloumpis et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] compared various features including n-gram
features, lexicon features based on the existence of polarity words
from the MPQA subjectivity lexicon2, POS features, and
microblogging features capturing the presence of emoticons, abbreviations,
and intensifiers (e.g., all-caps and character repetitions). They found
that micoblogging features are most useful in sentiment
classification.
3. TWITTER SENTIMENT CORPUS
In the work conducted in this paper, we used the Stanford Twitter
Sentiment Data3 which was collected between the 6th of April and
the 25th of June 2009 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The training set consists of 1.6 million
tweets with the same number of positive and negative tweets
labelled using emoticons. For example, a tweet is labelled as positive
if it contains :), :-), : ), :D, or =) and is labelled as negative if it has
:(, :-(, or : (, etc. The original test set consists of 177 negative and
182 positive manually annotated tweets. In contrast to the
training set which was collected based on specific emoticons, the test
set was collected by searching Twitter API with specific queries
including products’ names, companies and people.
      </p>
      <p>We built our training set by randomly selecting 60,000 balanced
tweets from the original training set in the Stanford Twitter
Sentiment Data. Since the original test set only contains a total of 359
tweets which is relatively small, we enlarge this set by manually
annotating more tweets. To simplify and speed up the annotation
efforts, we have built Tweenator4, a web-based sentiment
annotation tool that allows users to easily assign a sentiment label to tweet
messages, i.e. assign a negative, positive or neutral label to a
certain tweet with regards to its contextual polarity. Using Tweenator,
12 different users have annotated additional 641 tweets from the
original remaining training data. Our final test set contains 1,000
tweet messages with 527 negative and 473 positive.</p>
      <p>It is worth mentioning that users who participated in the
annotation process have reported that using the annotation interface of
Tweenator, as shown in Figure 3-a, they were able to annotate 10
tweet messages in 2 to 3 minutes approximately.</p>
      <p>Recently, we have added two new modules to Tweenator by
implementing our work that will be described in Section 4. The first
module (see Figure 3-b) provides a free-form sentiment detection,
which allows users to detect the polarity of their textual entries. The
second module is the opinionated tweet message retrieval tool (see
2http://www.cs.pitt.edu/mpqa/
3http://twittersentiment.appspot.com/
4http://atkmi.com/tweenator/
#MSM2012</p>
    </sec>
    <sec id="sec-3">
      <title>4. SEMANTIC FEATURES</title>
      <p>Twitter is an open social environment where there are no
restrictions on what users can tweet about. Therefore, a huge number of
infrequent named entities, such as people, organization, products,
etc., can be found in tweet messages. These infrequent entities
make the data very sparse and hence hinder the sentiment
classification performance. Nevertheless, many of these named entities are
semantically related. For example, the entities “ iPad” and “ iPhone”
can be mapped to the same semantic concept “ Product/Apple”.
Inspired by this observation, we propose using semantic features to
alleviate the sparsity problem from tweets data. We first extract
named entities from tweets and map them to their corresponding
semantic concepts. We then incorporate these semantic concepts
into NB classifier training.</p>
    </sec>
    <sec id="sec-4">
      <title>4.1 Semantic Concept Extraction</title>
      <p>We investigated three third-party services to extract entities from
tweets data, Zemanta,5 OpenCalais,6 and AlchemyAPI.7 A quick
and manual comparison of a randomly selected 100 tweet
messages with the extracted entities and their corresponding semantic
concepts showed that AlchemyAPI performs better than the others
in terms of the quality and the quantity of the extracted entities.
Hence, we used AlchemyAPI for the extraction of semantic
concepts in our paper.</p>
      <p>Using AlchemyAPI, we extracted a total of 15,139 entities from
the training set, which are mapped to 30 distinct concepts and
extracted 329 entities from the test set, which are mapped to 18
distinct concepts. Table 1 shows the top five extracted concepts from
the training data with the number of entities associated with them.</p>
      <sec id="sec-4-1">
        <title>Concept Person Company City</title>
        <p>Country
Organisation</p>
      </sec>
      <sec id="sec-4-2">
        <title>Number of Entities</title>
        <p>4954
2815
1575
961
614
4.2 Incorporating Semantic Concepts into NB</p>
        <p>Training
The extracted semantic concepts can be incorporated into sentiment
classifier training in a naive way where entities are simply replaced
by their mapped semantic concepts in the tweets data. For example,
all the entities such as “ iPhone”, “ iPad”, and “ iPod” are replaced
by the semantic concept “ Product/Apple”. A more principled way
to incorporate semantic concepts is through interpolation. Here, we
propose interpolating the unigram language model with the
generative model of words given semantic concepts in NB training.
In NB, the assignment of a sentiment class c to a given tweet w can
5http://www.zemanta.com/
6http://www.opencalais.com/
7http://www.alchemyapi.com/
(b) Free-Form Sentiment Detector Interface.
#MSM2012
be computed as:
cˆ = arg max P (c|w)</p>
        <p>c∈C
= arg max P (c)
c∈C</p>
        <p>Y
1≤i≤Nw</p>
        <p>P (wi|c),
where Nw is the total number of words in tweet w, P (c) is the
prior probability of a tweet appearing in class c, P (wi|c) is the
conditional probability of word wi occurring in a tweet of class c.
In multinomial NB, P (c) can be estimated by P (c) = Nc/N
Where Nc is the number of tweets in class c and N is the total
number of tweets. P (wi|c) can be estimated using maximum
likelihood with Laplace smoothing:</p>
        <p>P (w|c) = Pw0∈V N (w0|c) + |V |</p>
        <p>N (w, c) + 1
Where N (w, c) is the occurrence frequency of word w in all
training tweets of class c and |V | is the number of words in the
vocabulary. Although using Laplace smoothing helps to prevent zero
probabilities of the “unseen” words, it assigns equal prior
probabilities to all of these words.</p>
        <p>We propose a new smoothing method where we interpolate the
unigram language model in NB with the generative model of words
given semantic concepts. Thus, the new class model with semantic
smoothing has the following formula:</p>
        <p>Ps(w|c) =(1 − α)Pu(w|c)
+ α</p>
        <p>
          X P (w|sj )P (sj |c)
j
Where Ps(w|c) is the unigram class model with semantic
smoothing, Pu(w|c) is the unigram class model with maximum likelihood
estimate, sj is the j-th concept of the word w, P (sj |c) is the
distribution of semantic concepts in training data of a given class and
it can computed via the maximum likelihood estimation. P (w|sj )
is the distribution of words in the training data given a concept and
it can be also computed via the maximum likelihood estimation.
Finally, the coefficientα is used to control the influence of the
semantic mapping in the new class model. By setting α to 0 the class
model becomes a unigram language model without any semantic
interpolation. On the other hand, setting α to 1 reduces the class
model to a semantic mapping model. In this work, α was
empirically set to 0.5.
5. SENTIMENT-TOPIC FEATURES
The joint sentiment-topic (JST) model [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is a four-layer
generative model which allows the detection of both sentiment and topic
simultaneously from text. The generative procedure under JST
boils down to three stages. First, one chooses a sentiment label
l from the per-document sentiment distribution πd. Following that,
one chooses a topic z from the topic distribution θd,l, where θd,l is
conditioned on the sampled sentiment label l. Finally, one draws a
word wi from the per-corpus word distribution φl,z conditioned on
both topic z and sentiment label l. The JST model does not require
labelled documents for training. The only supervision is word prior
polarity information which can be obtained from publicly available
sentiment lexicons such as the MPQA subjectivity lexicon.
We train JST on the training set with tweet sentiment labels being
discarded. The resulting model assigns each word in tweets with
a sentiment label and a topic label. Hence, JST essentially
clusters different words sharing similar sentiment and topic. We list
some of the topic words extracted by JST in Table 2. Words in each
cell are grouped under one topic and the upper half of the table
shows topic words bearing positive sentiment while the lower half
shows topic words bearing negative polarity. It can be observed that
words groups under different sentiment and topic are quite
informative and coherent. For example, Topic 3 under positive sentiment is
related to a good music album, while Topic 1 under negative
sentiment is about a complaint of feeling sick possibly due to cold and
headache.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Topic 1 Topic 2 Topic 3 Topic 4</title>
        <p>dream bought song eat
sweet short listen food
e train hair love coffe
itv angel love music dinner
iso love wear play drink
Pgoodnight shirt album yummi
free dress band chicken
club photo guitar tea
feel miss rain exam
today sad bike school
e hate cry car week
tiv sick girl stop tomorrow
a
eg cold gonna ride luck
N suck talk hit suck
weather bore drive final
headache feel run studi
Topic 5
movi
show
award
live
night
mtv
concert
vote
job
hard
find
hate
interview
lost
kick
problem</p>
        <p>Inspired by the above observations, grouping words under the same
topic and bearing similar sentiment could potentially reduce data
sparseness in twitter sentiment classification. Hence, we extract
sentiment-topics from tweets data and augment them as additional
features into the original feature space for NB training. Algorithm 1
shows how to perform NB training with sentiment-topics extracted
from JST. The training set consists of labeled tweets, Dtrain =
{(wn; cn) ∈ W × C : 1 ≤ n ≤ N train}, where W is the input
space and C is a finite set of class labels. The test set contains
tweets without labels, Dtest = {wnt ∈ W : 1 ≤ n ≤ N test}.
A JST model is first learned from the training set and then infer
sentiment-topic for each tweet in the test set. The original tweets
are augmented with those sentiment-topics as shown in Step 4 of
Algorithm 1, where li_zi denotes a combination of sentiment label
li and topic zi for word wi. Finally, an optional feature selection
step can be performed according to the information gain criteria
and a classifier is then trained from the training set with the new
feature representation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. EXPERIMENTAL RESULTS</title>
      <p>In this section, we present the results obtained on the twitter
sentiment data using both semantic features and sentiment-topic
features and compare with the existing approaches.</p>
    </sec>
    <sec id="sec-6">
      <title>6.1 Pre-processing</title>
      <p>The raw tweets data are very noisy. There are a large number of
irregular words and non-English characters. Tweets data have some
unique characteristics which can be used to reduce the feature space
through the following pre-processing:
#MSM2012
Algorithm 1 NB training with sentiment-topics extracted from
JST.</p>
      <p>Input: The training set Dtrain and test set Dtest
Output: NB sentiment classifier
1: Train a JST model on Dtrain with the document labels discarded
2: Infer sentiment-topic from Dtest
3: for each tweet wn = (w1, w2, ..., wm) ∈ {Dtrain, Dtest} do
4: Augment tweet with sentiment-topics generated from JST,
wn0 = (w1, w2, ..., wm, l1_z1, l2_z2, ..., lm_zm)
5: end for
6: Create a new training set Dtrain0 = {(wn0; cn) : 1 ≤ n ≤ N train}
7: Create a new test set Dtest0 = {wn0 : 1 ≤ n ≤ N test}
8: Perform feature selection using IG on Dtrain0
9: Return NB trained on Dtrain0</p>
      <p>Pre-processing
None
Username
Hashtag
URLS
Repeated Letters
Digits
Symbols
All
• All URL links in the corpus are replaced with the term “URL”.
• Reduce the number of letters that are repeated more than
twice in all words. For example the word “loooooveeee”
becomes “loovee” after reduction.
• Remove all Twitter hashtags which start with the # symbol,
all single characters and digits, and non-alphanumeric
characters.</p>
    </sec>
    <sec id="sec-7">
      <title>6.2 Semantic Features</title>
      <p>We have tested both the NB classifier from WEKA8 and the
maximum entropy (MaxEnt) model from MALLET9. Our results show
that NB consistently outperforms MaxEnt. Hence, we use NB as
our baseline model. Table 4 shows that with NB trained from
unigrams only, the sentiment classification accuracy of 80.7% was
obtained.</p>
      <p>We extracted semantic concepts from tweets data using Alchemy
API and then incorporated them into NB training by the
following two simple ways. One is to replace all entities in the tweets
corpus with their corresponding semantic concepts (semantic
replacement). Another is to augment the original feature space with
semantic concepts as additional features for NB training
(semantic augmentation). With semantic replacement, the feature space
shrunk substantially by nearly 20%. However, sentiment
classification accuracy drops by 4% compared to the baseline as shown
8http://www.cs.waikato.ac.nz/ml/weka/
9http://mallet.cs.umass.edu/
in Table 4. The performance degradation can be explained as the
mere use of semantic concepts replacement which leads to
information loss and subsequently hurts NB performance. Augmenting
the original feature space with semantic concepts performs slightly
better than sentiment replacement, though it still performs worse
than the baseline.</p>
      <p>With Semantic interpolation, semantic concepts were incorporated
into NB training taking into account the generative probability of
words given concepts. The method improves upon the baseline
model and gives a sentiment classification accuracy of 84%.</p>
      <sec id="sec-7-1">
        <title>Method</title>
        <p>Unigrams
Semantic replacement
Semantic augmentation
Semantic interpolation
Sentiment-topic features</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>6.3 Sentiment-Topic Features</title>
      <p>
        To run JST on the tweets data, the only parameter we need to set
is the number of topics T . It is worth noting that the total
number of the sentiment-topics that will be extracted is 3 × T . For
example, when T is set to 50, there are 50 topics under each of
positive, negative and neutral sentiment labels. Hence the total
number of sentiment-topic features is 150. We augment the original
bag-of-words representation of the tweet messages by the extracted
sentiment-topics. Figure 4 shows the classification accuracy of NB
trained from the augmented features by varying the number of
topics from 1 to 65. The initial sentiment classification accuracy is
81.1% with topic number 1. Increasing the number of topics leads
to the increase of classification accuracy with the peak value of
82.3% being reached at topic number 50. Further increasing topic
numbers degrades the classifier performance.
6.4 Comparison with Existing Approaches
In order to compare our proposed methods with the existing
approaches, we also conducted experiments on the original Stanford
Twitter Sentiment test set which consists of 177 negative and 182
positive tweets. The results are shown in Table 5. The sentiment
classification accuracy of 83% reported in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] was obtained using
MaxEnt trained on a combination of unigrams and bigrams. It
should be noted that while Go et al. used 1.6 million tweets for
training, we only used a subset of 60,000 tweets as our training set.
#MSM2012
Speriosu et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] tested on a subset of the Stanford Twitter
Sentiment test set with 75 negative and 108 positive tweets. They
reported the best accuracy of 84.7% using label propagation on a
rather complicated graph that has users, tweets, word unigrams,
word bigrams, hashtags, and emoticons as its nodes.
      </p>
      <p>It can be seen from Table 5 that sentiment replacement performs
worse than the baseline. Sentiment augmentation does not result
in the significant decrease of the classification accuracy, though it
does not lead to the improved performance either. Our semantic
interpolation method rivals the best result reported on the Stanford
Twitter Sentiment test set. Using the sentiment-topic features, we
achieved 86.3% sentiment classification accuracy, which
outperforms the existing approaches.</p>
      <sec id="sec-8-1">
        <title>Method</title>
        <p>Unigrams
Semantic replacement
Semantic augmentation
Semantic interpolation
Sentiment-topic features
(Go et al., 2009)
(Speriosu et al., 2011)</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>6.5 Discussion</title>
      <p>We have explored incorporating semantic features and
sentimenttopic features for twitter sentiment classification. While
simplesemantic replacement or augmentation does not lead to the
improvement of sentiment classification performance, sentiment
interpolation improves upon the baseline NB model trained on unigrams
only by 3%. Augmenting feature space with sentiment-topics
generated from JST also results in the increase of sentiment
classification accuracy compared to the baseline. On the original Stanford
Twitter Sentiment test set, NB classifiers learned from
sentimenttopic features outperform the existing approaches.</p>
      <p>We have a somewhat contradictory observation here. Using
sentimenttopic features performs worse than using semantic features on the
test set comprising of 1000 tweets. But the reverse is observed on
the original Stanford Twitter Sentiment test set with 359 tweets.
We therefore conducted further experiments to compare these two
approaches.</p>
      <p>#MSM2012
We performed feature selection using information gain (IG) on the
training set. We calculated the IG value for each feature and sorted
them in descending order based on IG. Using each distinct IG value
as a threshold, we ended up with different sets of features to train a
classifier. Figure 5 shows the sentiment classification accuracy on
the 1000-tweet test set versus different number of features. It can be
observed that there is an abrupt change in x-axis from around 5600
features jumping to over 30,000 features. Using sentiment-topic
features consistently performs better than using semantic features.
With as few as 500 features, augmenting the original feature space
with sentiment-topics already achieves 80.2% accuracy. Although
with all the features included, NB trained with semantic features
performs better than that with sentiment-topic features, we can still
draw a conclusion that sentiment-topic features should be preferred
over semantic features for the sentiment classification task since it
gives much better results with far less features.
7. CONCLUSIONS AND FUTURE WORK
Twitter is an open social environment where users can tweet about
different topics within the 140-character limit. This poses a
significant challenge to Twitter sentiment analysis since tweets data are
often noisy and contain a large number of irregular words and
nonEnglish symbols and characters. Pre-processing by filtering some
of the non-standard English words leads to a significant reduction
of the original feature space by nearly 61.0% on the Twitter
sentiment data. Nevertheless, the pre-processed tweets data still contain
a large number of rare words.</p>
      <p>In this paper, we have proposed two sets of features to alleviate the
data sparsity problem in Twitter sentiment classification, semantic
features and sentiment-topic features. Our experimental results on
the Twitter sentiment data show that while both methods improve
upon the baseline Naïve Bayes model trained from unigram
features only, using sentiment-topic features gives much better results
than using semantic features with less features.</p>
      <p>Compared to the existing approaches to twitter sentiment analysis
which either rely on sophisticated feature engineering or
complicated learning procedure, our approaches are much more simple
and straightforward and yet attain comparable performance.
There are a few possible directions we would like to explore as
future work. First, in the semantic method all entities where simply
replaced by the associated semantic concepts. It is worth to perform
a selective statistical replacement, which is determined based on the
contribution of each concept towards making a better classification
decision. Second, sentiment-topics generated by JST model were
simply augmented into the original feature space of tweets data. It
could lead to better performance by attaching a weight to each
extracted sentiment-topic feature in order to control the impact of the
newly added features. Finally, the performance of the NB
classifiers learned from semantic features depends on the quality of the
entity extraction process and entity-concept mapping method. It
is worth to investigate a filtering method which can automatically
filter out low-confidence semantic concepts.</p>
      <p>Acknowledgement This work is partially funded by the EU project
ROBUST (grant number 257859).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>AGARWAL</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>XIE</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>VOVSHA</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>RAMBOW</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AND PASSONNEAU</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>Sentiment analysis of twitter data</article-title>
          .
          <source>In Proceedings of the ACL 2011 Workshop on Languages in Social Media</source>
          (
          <year>2011</year>
          ), pp.
          <fpage>30</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>BARBOSA</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AND FENG</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Robust sentiment detection on twitter from biased and noisy data</article-title>
          .
          <source>In Proceedings of COLING</source>
          (
          <year>2010</year>
          ), pp.
          <fpage>36</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>BHUIYAN</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Social media and its effectiveness in the political reform movement in egypt</article-title>
          .
          <source>Middle East Media Educator</source>
          <volume>1</volume>
          ,
          <issue>1</issue>
          (
          <year>2011</year>
          ),
          <fpage>14</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>BOIY</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>HENS</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>DESCHACHT</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          , AND MOENS,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Automatic sentiment analysis in on-line text</article-title>
          .
          <source>In Proceedings of the 11th International Conference on Electronic Publishing</source>
          (
          <year>2007</year>
          ), pp.
          <fpage>349</fpage>
          -
          <lpage>360</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>GO</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>BHAYANI</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ,
          <string-name>
            <surname>AND HUANG</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <article-title>Twitter sentiment classification using distant supervision</article-title>
          .
          <source>CS224N Project Report</source>
          , Stanford (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>HATZIVASSILOGLOU</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AND WIEBE</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Effects of adjective orientation and gradability on sentence subjectivity</article-title>
          .
          <source>In Proceedings of the 18th conference on Computational linguistics-Volume</source>
          <volume>1</volume>
          (
          <year>2000</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , pp.
          <fpage>299</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>HE</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , AND SAIF,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>Quantising Opinons for Political Tweets Analysis</article-title>
          .
          <source>In Proceeding of the The eighth international conference on Language Resources</source>
          and
          <string-name>
            <surname>Evaluation (LREC) - In Submission</surname>
          </string-name>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>HU</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , AND LIU,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>Mining and summarizing customer reviews</article-title>
          .
          <source>In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          (
          <year>2004</year>
          ), ACM, pp.
          <fpage>168</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>HUSSAIN</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AND HOWARD</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>the role of digital media</article-title>
          .
          <source>Journal of Democracy 22</source>
          ,
          <issue>3</issue>
          (
          <year>2011</year>
          ),
          <fpage>35</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>KOULOUMPIS</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , WILSON,
          <string-name>
            <surname>T.</surname>
          </string-name>
          , AND MOORE,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Twitter sentiment analysis: The good the bad and the omg</article-title>
          !
          <source>In Proceedings of the ICWSM</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>LIN</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , AND HE,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>Joint sentiment/topic model for sentiment analysis</article-title>
          .
          <source>In Proceeding of the 18th ACM conference on Information and knowledge management</source>
          (
          <year>2009</year>
          ), ACM, pp.
          <fpage>375</fpage>
          -
          <lpage>384</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>NARAYANAN</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , LIU,
          <string-name>
            <surname>B.</surname>
          </string-name>
          ,
          <article-title>AND CHOUDHARY, A. Sentiment Analysis of Conditional Sentences</article-title>
          . In EMNLP (
          <year>2009</year>
          ), pp.
          <fpage>180</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>PAK</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , AND PAROUBEK,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>Twitter as a corpus for sentiment analysis and opinion mining</article-title>
          .
          <source>Proceedings of LREC</source>
          <year>2010</year>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>PANG</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , AND LEE,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <article-title>A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts</article-title>
          .
          <source>In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics</source>
          (
          <year>2004</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , p.
          <fpage>271</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>PANG</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LEE</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>AND VAITHYANATHAN, S. Thumbs up?: sentiment classification using machine learning techniques</article-title>
          .
          <source>In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-</source>
          Volume
          <volume>10</volume>
          (
          <year>2002</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>READ</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , AND CARROLL,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Weakly supervised techniques for domain-independent sentiment classification</article-title>
          .
          <source>In Proceeding of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion</source>
          (
          <year>2009</year>
          ), pp.
          <fpage>45</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>SAIF</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , HE,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          , AND ALANI,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>Semantic Smoothing for Twitter Sentiment Analysis</article-title>
          .
          <source>In Proceeding of the 10th International Semantic Web Conference (ISWC)</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>SPERIOSU</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>SUDAN</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>UPADHYAY</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , AND BALDRIDGE,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Twitter polarity classification with label propagation over lexical links and the follower graph</article-title>
          .
          <source>Proceedings of the EMNLP First workshop on Unsupervised Learning in NLP (</source>
          <year>2011</year>
          ),
          <fpage>53</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>TABOADA</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AND GRIEVE</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Analyzing appraisal automatically</article-title>
          .
          <source>In Proceedings of AAAI Spring Symposium on Exploring Attitude and Affect in Text (AAAI Technical Report SS-04-07)</source>
          (
          <year>2004</year>
          ), pp.
          <fpage>158</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>TURNEY</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews</article-title>
          .
          <source>In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL'02)</source>
          (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>WARD</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AND OSTROM</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>The internet as information minefield:: An analysis of the source and content of brand information yielded by net searches</article-title>
          .
          <source>Journal of Business research 56</source>
          ,
          <volume>11</volume>
          (
          <year>2003</year>
          ),
          <fpage>907</fpage>
          -
          <lpage>914</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>YOON</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>GUFFEY</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>AND</article-title>
          <string-name>
            <surname>KIJEWSKI</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>The effects of information and company reputation on intentions to buy a business service</article-title>
          .
          <source>Journal of Business Research</source>
          <volume>27</volume>
          ,
          <issue>3</issue>
          (
          <year>1993</year>
          ),
          <fpage>215</fpage>
          -
          <lpage>228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>ZHAO</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , LIU,
          <string-name>
            <surname>K.</surname>
          </string-name>
          , AND WANG,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Adding redundant features for CRFs-based sentence sentiment classification</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          (
          <year>2008</year>
          ), pp.
          <fpage>117</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>