=Paper=
{{Paper
|id=Vol-1743/paper9
|storemode=property
|title=Dictionary-based Sentiment Analysis Applied to Specific Domain using a Web Mining Approach
|pdfUrl=https://ceur-ws.org/Vol-1743/paper9.pdf
|volume=Vol-1743
|authors=Laura Cruz-Quispe,José Eduardo Ochoa Luna,Mathieu Roche,Pascal Poncelet
|dblpUrl=https://dblp.org/rec/conf/simbig/QuispeLRP16
}}
==Dictionary-based Sentiment Analysis Applied to Specific Domain using a Web Mining Approach==
<pdf width="1500px">https://ceur-ws.org/Vol-1743/paper9.pdf</pdf>
<pre>
    Dictionary-Based Sentiment Analysis applied to specific domain using a
                           Web Mining approach
                    Laura Cruz                                José Ochoa
      Universidad Nacional de San Agustı́n, Perú Universidad Católica San Pablo, Perú
              lcruzq@unsa.edu.pe                      jeochoa@ucsp.edu.pe

                  Mathieu Roche                                     Pascal Poncelet
                       TETIS                                         LIRMM, Cnrs
                     Cirad, Cnrs                              Université Montpellier, France
             AgroParisTech, Irstea, France                  pascal.poncelet@lirmm.fr
            mathieu.roche@cirad.fr

                     Abstract                              express opinions about some topics can be spe-
                                                           cific and highly correlated to a particular domain
     In recent years, the Web and social media             (Duthil et al., 2011). Likewise, while we may
     are growing exponentially. We are pro-                find that The chair is black, such an adjective
     vided with documents which have opin-                 would be unusual in a movies domain. To tackle
     ions expressed about several topics. This             these issues both machine learning and dictionary-
     constitute a rich source for Natural Lan-             based approaches have been proposed in the lit-
     guage Processing tasks, in particular, Sen-           erature. A machine learning method that applies
     timent Analysis. In this work, we aim at              text-categorization techniques has been proposed
     constructing a sentiment dictionary based             by (Pang and Lee, 2004). In such method, graphs,
     on words obtained from web pages re-                  minimum cut formulation, context and domain
     lated to a specific domain. To do so, we              have been considered to extract subjective portions
     correlate candidate opinion words, seed               of documents.
     words and domain using AcroDefM I3                       On the other hand, dictionary based approaches
     and TrueSkill methods. This dictionary-               are unsupervised in nature. In general, these meth-
     based approach is compared to the Sen-                ods assume that positive (negative) adjectives ap-
     tiWordNet lexical resource. Experimental              pear more frequently near a positive (negative)
     results show suitability of our approach for          seed word (Harb et al., 2008). An unsupervised
     multiple domains and infrequent opinion               learning algorithm for classifying reviews (thumbs
     words.                                                up or thumbs down) has been adopted by (Turney,
                                                           2002; Wang and Araki, 2007). A review classifica-
1    Introduction                                          tion is given by the average semantic orientation of
In recent years, the Web and social media                  their phrases which contain either adjectives or ad-
are growing exponentially, this constitute a rich          verbs. A phrase semantic orientation is computed
source for Sentiment Analysis tasks. Companies             using the mutual information between the given
are increasingly using the content in these media          phrase and the word excellent minus the mutual in-
to make better decisions (Marrese-Taylor et al.,           formation between the given phrase and the word
2013). Social networking sites are being used for          poor. Therefore, a phrase has a positive seman-
expressing thoughts and opinions about products            tic orientation when it has good associations and a
by users (Amine et al., 2014). In this context,            negative semantic orientation when it has bad as-
Sentiment Analysis involves the process of iden-           sociations, as shown by equation 1.
tifying the polarity of opinionated texts. These
                                                             SO(phrase) =
opinionated texts are highly unstructured in nature
and thus involves the application of Natural Lan-              hits(phrase NEAR excellent) · hits(poor)
                                                           log
guage Processing techniques (Varghese and Jayas-               hits(phrase NEAR poor) · hits(excellent)
ree, 2013). As a rule, documents have opinion-                                                         (1)
ated texts about several topics. Words used to               In this work, words used to express opinions


                                                      80
are learned. To do so, positive and negative                    Domain,
                                                                                                                                  Web
seed words (e.g. good, excellent, bad) are used                 Seed Word                       Corpus
                                                                                               Adquisition                       Pages

to extract adjectives near seed words. To cor-
relate candidate words, seed words and domain,                                               Pre-processing

AcroDefM I3 and TrueSkill methods are pro-                                                              Text
                                                                                                                                 POS-Tag


posed. Experimental results show suitability of                                              Word Extraction                   Window Size(N)
our proposal. Several domains (e.g movies, agri-
                                                                                       Nouns,
cultural) were used to compare our approach to                                        Adjectives
                                                                                                                       Score
SentiWordNet.                                                                                Word Selection                         MI3
                                                                                                                                 TrueSkill
   The paper is organized as follows. The Method-                                                                              SentiWordNet

ology is presented in Section 2. Experimental
setup is described in Section 3. In Section 4, we                       Dictionary+                          Dictionary-
present and discuss the obtained results. Conclud-
ing remarks are presented in Section 5.
                                                              Figure 1: Lexicons are inferred from Web pages
2     Methodology                                             correlated to seed words and domains, via extrac-
                                                              tion and process of candidate words.
The proposed process is depicted in Figure 1. The
steps are summarized in the following steps:
    1. A corpora for a specific domain, contain-
                                                              These examples show that a given word, for in-
       ing positive and negative opinions is acquired
                                                              stance scientific, can be highly correlated to a par-
       from the Web.
                                                              ticular domain (Harb et al., 2008). The first exam-
    2. Each document is pre-processed to get text,            ple is considered a neutral opinion. Conversely,
       remove HTML tags and scripts.                          the second example is considered a positive opin-
                                                              ion. The third example is also a positive opinion
    3. Opinion adjectives and nouns are extracted             because of the word good. Thus, some words are
       using POS-Tagging and the Window Size al-              useful to learn opinion words related to a given
       gorithm.                                               domain. We can define a seed word, such as good,
    4. The correlation score of a given word with             that can help us to find others opinion words.
       a seed word and domain is computed us-                   Lexicons are built using selected words from
       ing AcroDefM I3 and TrueSkill. lexicons                web page corpus. Web pages are retrieved using
       are inferred based on these correlation scores         Bing search engine. Queries used to retrieve this
       that identify semantic orientation for each ex-        web pages combine seed words and domain key-
       tracted word. High correlation score words             words. We have positive and negative seed words,
       are selected.                                          P = {good, nice, excellent, positive, fortunate, cor-
                                                              rect, superior} , Q = {bad, nasty, poor, negative,
We perform experiments over two domains: Agri-
                                                              unfortunate, wrong, inferior}, respectively.
cultural domain (opinions extracted from Twitter)
and a Movie domain1 (data set introduced in (Pang                A positive (negative) seed word ensure a pos-
et al., 2002)). Further details are given in the next         itive (negative) web page about a query domain,
sections.                                                     due to all opposite seed words are excluded from
                                                              that query. For example, the following query can
2.1 Corpus Acquisition
                                                              be used for retrieving positive pages: query+ =
Some words can express neutral, positive or nega-             +opinion + review + gmo + good bad nasty
tive opinion in specific domain such as:                      poor negative unfortunate wrong inferior

    Neutral ! I attend scientific conferences.             Thus, we have positive and negative web pages
                                                         denoted by corpus+ , corpus respectively. Each
Positive ! The list shows the scientific discoveries.
                                                         corpus is related to a seed word and a given do-
Positive ! He made a good scientific discovery.          main. In the next section we will extract words
   1
     http://www.cs.cornell.edu/People/pabo/movie-review- near seed words for each web page corpus using
data/                                                    POS-Tagging and the Window Size algorithm.


                                                         81
2.2 Word extraction                                         as gmo, can be used to express a domain opin-
Opinion words near a seed word can have the same            ion. Hence, we need to measure the correlation
polarity (Roche and Prince, 2007; Harb et al.,              of a given extracted word with domain and seed
2008). The same approach has been used to ex-               word to build a lexicon. In order to get candi-
tract candidate opinion words. To identify opinion          date opinion words we propose to use the statis-
words (nouns and adjectives) in web page corpus,            tical measure AcroDefM I3 (equation 2) (Roche
TreeTagger2 has been used. Previously, HTML                 and Prince, 2007). Moreover, we also propose a
tags, scripts, blank spaces and stop words3 were            novel probabilistic measure based on the TrueSkill
removed from web pages. In order to get near                Algorithm (Herbrich et al., 2007) (Algorithm 3).
words for each seed word a Window Size algo-                   The AcroDefM I3 measure takes each word ex-
rithm has been used (Algorithm 1). The Window               tracted using the Window Size algorithm and com-
Size Algorithm looks for opinion words in both              putes the following equation 2, which is based on
left and right sides of a seed word given a K dis-          web mining.
tance. This distance is the number of left (right)             The total web page results, based on queries
opinion words of a seed word given a web page               that combine candidate words, seed words and
corpus. This process is shown in Algorithm 1.               domain keywords, are used in the AcroDefM I3
                                                            measure to get the correlation score for each ex-
Algorithm 1 The Window Size Algorithm                       tracted word.
Require: seed words, corpus, K
                                                              AcroDefM I3 =
Ensure: opinion words                                              0                             1
 1: words      TreeTagger to each corpus                             (nb(sw word AND domain)+
 2: words      filter adjectives and nouns                         B    nb(word sw AND domain))3 C
                                                               log B
                                                                   @ nb(sw AND domain)
                                                                                                 C (2)
                                                                                                 A
 3: for index= 0 until total of words do
 4:      if words{index} in seed word then                                · nb(word AND domain)
 5:          for k = 1 until K do                           where sw is a seed word, nb(x) function is the
 6:              left word     words[index - k]             number of total result pages, x is the query used
 7:              right word      words[index + k]           to retrieve pages in the search engine, and word
 8:              opinion words                              is the word extracted using the Window Size al-
    left word and right word                                gorithm. This process is detailed in Algorithm 2.

In Figure 2 adjectives (JJ) and nouns (NNS) are
retrieved using TreeTagger. The good word is a              Algorithm 2 Word selection algorithm using
positive seed word and its nearest adjective is safe        AcroDefM I3
given k = 1 distance. Likewise, scientific and              Require: corpus, seed words = P, keywords of
studies words are retrieved with distance k = 2.                domain
In addition, safe is a positive opinion word candi-         Ensure: correlation score values for each word
date because it occurred near a positive seed word           1: for each corpus do
(good). In this sense, we can have a set of opin-            2:     words+ = window size(corpus+ , P )
ion words (positive and negative), that can be can-          3:     for word in words+ do
didates to include into the resulting lexicon. To            4:         given each seed word and keywords of
get the correlation score of each extracted word                domain compute correlation score:
given a seed word, two measures are employed:                5:         score    max(AcroDefM I3 )
AcroDefM I3 and TrueSkill which are described
in the next section.                                             Unlike AcroDefM I3 , in the TrueSkill approach
                                                              words are extracted using the Window Size algo-
2.3 Word Selection
                                                              rithm and the measure function is applied. Fur-
As seen in our previous example (Figure 2), the               thermore, words are extracted for each positive
scientific word was retrieved using window size               (negative) page against k random negative (pos-
distance = 2. However, specific words, such                   itive) pages and then their score words are com-
   2
     http://www.cis.uni-muenchen.de/ schmid/tools/TreeTagger/ puted. Thus, TrueSkill configures a match be-
   3
     http://www.ranks.nl/stopwords                            tween positive pages words against negative pages


                                                       82
   Scientific studies have frequently found that GMO’s are safe to eat and even good.
               JJ                  NNS               VHP                      RB                         VVN                  IN    NNS     VBP    JJ   TO   VV         CC     RB      JJ


                                                                                                                                                             window size = 1


                                                                                                                          window size = 2


                                                                  Figure 2: Window size sample for good seed word.


words. The process is detailed in Figure 3, where                                                                                   Algorithm 3 Word selection algorithm using
                                                                                                                                    TrueSkill
                                                                                                                                    Require: corpus, seed words(P, Q)
                    2                                2                            2                                   2
N (s1,1 , µ1,1 ;    1,1 )        N (s1,2 , µ1,2 ;    1,2N
                                                        ) (s2,1 , µ2,1 ;          2,1 )           N (s2,2 , µ2,2 ;    2,2 )


                                                                                                                                    Ensure: correlation score values for each word
                                                                                                                                     1: k = 10 number of match for each corpus.
        s1,1                             s1,2                         s2,1                                s2,2

                                                                                                                                     2: for each corpus do
                    2                                    2                         2                                  2
 N (p1,1 ; s1,1 ;       )         N (p1,2 ; s1,2 ;           ) N (p2,1 ; s2,1 ;        )           N (p2,2 ; s2,2 ;       )

                                                                                                                                     3:     words+ = window size(corpus+ , P )
        p1,1                             p1,2                         p2,1                                p2,2
                                                                                                                                     4:     for k random corpus do
 I(t1 = p1,1 + p1,2 )                                                                           I(t2 = p2,1 + p2,2 )
                                                                                                                                     5:         words = window size(corpus , Q)
                            t1                                                             t2                                        6:         given each word compute correlation
                                                                I(d1 = t1         t2 )
                                                                 1                                                                      score:
                                                         d1                                                                          7:         score
                                                     2                                                                                  T rueSkill(words+ , words , t = [1, 2])
                                                I(d1 > ")


                                                                                                                                       Team           Words              Si          S i+1
Figure 3: TrueSkill Model, learning score for each                                                                                     word+      bioengineered        22, 738      22, 809
word selected given the positive and negative cor-                                                                                     word         economic           0, 001       0, 022
pus.

S = {s1,1 , s1,2 , , s1,n } and S = {s2,1 , s2,2 , , s2,n },                                                                           Where: S i denotes current correlation score
s are learning values for each word in positive and                                                                                 for each word, and S i+1 , the updated value
negative web page respectively. p is the learning                                                                                   after matching pages (positive against negative
performance for each word, t is the sum of total                                                                                    page), bioengineered is a word near excellent,
performance for each word in corpus.                                                                                                a seed word 2 P , and economic is near wrong,
   As T rueSkill learns s according its match out-                                                                                  seed word 2 Q when the Window Size algorithm
come, we set a high punctuation for corpus+ , and                                                                                   has distance k = 1. Thus, when the same
less punctuation for corpus . Therefore, we have                                                                                    corpus+ has a match with other corpus :
d = t1 t2 . Due to difference (d) is important, we
set t1 = 1 to a positive corpus and t2 = 2 to a neg-                                                                                corpus = Various studies · · · poor agricul-
ative corpus, where 1 denotes first. This process is                                                                                tural income · · · .
detailed in Algorithm 3.
   The following example shows how TrueSkill
measures two collected web pages:
                                                                                                                                      Team            Words              Si          S i+1
                                                                                                                                      word+       bioengineered        22, 738      28, 023
corpus+ = By the way a New York Times                                                                                 ···
                                                                                                                                      word         agricultural         0, 108       4, 764
excellent job · · · bioengineered food · · · .

corpus = Roundup Ready cotton · · ·                                                                          wrong                    It is worth noting that agricultural becomes a
solution · · · at any economic advantage.                                                                                           more negative word than economic because its
                                                                                                                                    value decreases more after the match using the
                                                                                                                                    same positive word: bioengineered. On one hand,


                                                                                                                               83
if a word is often found in a corpus its value                              Seed Word               Domain
tends to decrease. On the other hand, if it is in                                             Agricultural Movie
a corpus+ its value will increase. If the word is                            superior             42         10
found in both corpus it tends to be constant. In the                           good              406        178
next section, experiments results are showed.                                positive             54         17
                                                                            fortunate             23         4
3    Experiments                                                            excellent             47         20
In order to validate our approach experiments over                            correct             24         7
two data sets were conducted. The polarity of each                             nice               40         23
opinion from domains (Agricultural tweets and                                  poor               58         14
Movie reviews) is predicted using the inferred lex-                         negative              65         25
icons, AcroDefM I3 and TrueSkill measures. Pre-                               wrong               64         43
cision, recall and f-score were measured in order                               bad               98         39
to compare to the SentiWordNet approach. Data                              unfortunate            22         27
sets used are described in the next section.                                   nasty              23         15
                                                                             inferior             23         11
3.1 Datasets
The domains keywords used in queries were:                      Table 1: Seed words(SW) frequency for Agricul-
Agriculture domain = {gmo, agricultural biotech-                tural Domain
nology, biotechnology for agriculture}, and
Movie domain = {cinema, film, movie}. In order
to test the agricultural domain, tweets using these             3.3      Window size
keywords were collected and manually classified.                Using web pages number k = 20, a high number
There were 50 positive and 61 negative tweets.                  of low frequency adjectives are retrieved as shown
The Movie domain 4 is based on (Pang and Lee,                   in Figure 5a. To get a word near a seed word with
2004). The number of positive and negative is re-               window size= 1, the maximum distance allowed
spectively 1000 and 1000.                                       is 10 words per window size.
   A simple classification procedure was used. In
order to do so, the number of positive and negative             3.4      Measure function (AcroDefM I3 ,
words in each tweet or review is computed using                          TrueSkill)
the inferred lexicons. If the difference is greater
                                                                Figures 4, 5 show words scores obtained using
than zero then it is classified as positive, otherwise
                                                                the measures proposed. It can be observed that
is negative. The following kind of lexicons were
                                                                words better discriminate than frequencies of Win-
used to sentiment classification:
                                                                dow Size Algorithm as shown in Figure 5a. Ta-
    • M I3: seed words + W S with AcroDefM I3 .                 ble 2, Table 3 show the top 5 words of inferred
                                                                lexicons.
    • T S: seed words + W S with T rueSkill.
                                                                3.5      SentiWordNet
    • SW N : SentiWordNet.
                                                                SentiWordNet5 is a lexical resource for opinion
where W S denotes words extracted with window                   mining. It assigns to each synset of WordNet three
size. Finally, the number of web pages retrieved                sentiment scores, positive, negative and neutral.
during the corpus acquisition for each seed word                We compute differences between positive and neg-
was k = 20.                                                     ative scores. If the result is greater than zero then
   In the next, we show word distributions for each             the polarity of the word is positive, otherwise neg-
type of lexicon.                                                ative. SentiWordNet assigns a different score for
3.2 Seed words                                                  each word according its context. As context is
                                                                not considered, higher positive and negative word
Table 1 shows the number of occurrences for each                scores are obtained. Finally, SentiWordNet com-
seed word in web pages.                                         prises 21479 adjectives and 117798 nouns.
   4
     http://www.cs.cornell.edu/People/pabo/movie-review-
                                                                   5
data/                                                                  http://sentiwordnet.isti.cnr.it/


                                                           84
(a) Word frequency using Window
                                             (b) MI3                              (c) TS
Size (WS)

                           Figure 4: Adjective words for Agricultural domain


(a) Word frequency using Window
                                             (b) MI3                              (c) TS
Size (WS)

                              Figure 5: Adjective words for Movie domain


                Adjective Words                                             Noun Words
          WS        MI3            TS                                WS         MI3          TS
                    Positive                                                  Positive
          dark     cheap       qualified                           flavor      luck         note
         daily       fat     inconclusive                             fit      night   commitment
         active   coconut        ideal                            movie      morning    judgment
       favorite     false        fresh                          opportunity   source    continent
          full   probiotic       active                              job      vodka        jihad
                   Negative                                                  Negative
       stunning     rural     devastating                         farmer    regulation    farmer
        german chemical irreversible                              debate       bread    regulation
        hungry   standard         sick                              cost        guy        group
       wealthy    brutish       general                          intensity     gmos      problem
       medical    hungry       chemical                            gmos         soil      tomato

Table 2: Top 5 adjectives for Agricultural domain            Table 3: Top 5 nouns for Agricultural domain


3.6 Classification
In order to classify opinions the inferred lexicons
are used. We have positive and negative lexi-
cons (dictionary) for each data sets (Agricultural,        top 10 new words ordered by their correlation
Movie), as shown in Table 7. In the Agricul-               score value. In order to validate the algorithms we
tural domain 32 new words have been learned that           calculate recall, precision and f-score. Figures 7,
do not appear in SentiWordNet. Likewise, in the            6 show the recall, precision and fscore using each
Movie domain 20 new words that do not appear in            word type(noun, adjectives), and the results using
SentiWordNet have been learned. Table 6 shows              MI3, SentiWordNet and TrueSkill.


                                                      85
                  Figure 6: Tweet classification, left with adjectives, right with nouns.


          Figure 7: Classification using Movie Reviews, left with adjectives, right with nouns.


                  Adjective Words                                          Noun Words
           WS          MI3           TS                            WS         MI3        TS
                      Positive                                              Positive
          comfy         big         late                            info     place      info
        expensive      real        clear                            wife       day     place
            late     natured     common                           people     food       staff
          french      sound       french                          service     feel     credo
        infectious     easy    commercial                          party      luck      city
                     Negative                                               Negative
        makeshift video         emotional                          blood     thing     blood
           video      pretty      russian                            rate    word    character
            lost     english cartoonish                          character person       idea
            fast      acting      treacly                         interest  blood progression
         attentive     full         dull                            time    video     activity

    Table 4: Top 5 adjectives for Movie domain               Table 5: Top 5 nouns for Movie domain

4   Discussion of the results
                                                         Precision and F-Score) than SentiWordNet and
When the inferred lexicon for the Movie domain           AcroDefM I3 for positive reviews using adjec-
is considered, TrueSkill performs better (Recall,        tives and nouns. When negative reviews are con-


                                                    86
                      Domain                                by using the Window Size Algorithm, it is possi-
            Agricultural      Movie                         ble to obtain new adjectives entries in both agricul-
             chocolaty    configurable                      tural and movie domains when compared to Sen-
            glyphosate       updated                        tiWordNet.
            phosphonic     readymade
           carfentrazone      nature                        Acknowledgments
             sporogene     directorial                      This work has been supported and financed by
                kalu        spendidly                       FONDECYT.
              protato      cartoonish
               adeed           mic
             phthalates     showreel                        References
           genotoxicity      coverup                        Abdelmalek Amine, Reda Mohamed Hamou, and
                                                              Michel Simonet. 2014. Detecting opinions in
Table 6: Top 10 words of inferred lexicons using              tweets. volume abs/1402.5123.
AcroDefM I3 and TrueSkill methods, which are                Benjamin Duthil, François Trousset, Mathieu Roche,
not in SentiW ordN et                                         Gérard Dray, Michel Plantié, Jacky Montmain, and
                                                              Pascal Poncelet, 2011. Towards an Automatic Char-
                                                              acterization of Criteria, pages 457–465. Springer
           Word       Positive Negative                       Berlin Heidelberg, Berlin, Heidelberg.
                         Agricultural
                                                            Ali Harb, Michel Plantie, Gerard Dray, Mathieu Roche,
         Adjective      200        119                        Francois Trousset, and Pascal Poncelet. 2008.
          Noun          314        189                        Web opinion mining: How to extract opinions from
                             Movie                            blogs? In Proceedings of the 5th International Con-
         Adjective      153        141                        ference on Soft Computing As Transdisciplinary Sci-
                                                              ence and Technology, CSTST 08, pages 211–217,
          Noun          171        183                        New York, NY, USA. ACM.

Table 7: Total of inferred lexicon words by do-             Ralf Herbrich, Tom Minka, and Thore Graepel. 2007.
                                                              Trueskill(tm): A bayesian skill rating system. In
main.                                                         Advances in Neural Information Processing Systems
                                                              20, pages 569–576. MIT Press, January.

sidered TrueSkill performs better using nouns than          Edison Marrese-Taylor, Juan D. Velsquez, Felipe
                                                              Bravo-Marquez, and Yutaka Matsuo. 2013. Iden-
adjectives.                                                   tifying customer preferences about tourism prod-
   On the other hand, in the Agricultural domain,             ucts using an aspect-based opinion mining approach.
SentiWordNet performs better than AcroDefM I3                 Procedia Computer Science, 22(0):182 – 191. 17th
and TrueSkill. This is due to the agricultural do-            International Conference in Knowledge Based and
                                                              Intelligent Information and Engineering Systems -
main was collected from Twitter. Tweets are short             {KES2013}.
texts that usually have more seed words and com-
mon words as shown in Table 1. The agricultural             Bo Pang and Lillian Lee. 2004. A sentimental educa-
                                                              tion: Sentiment analysis using subjectivity summa-
domain has frequent seed words.                               rization based on minimum cuts. In Proceedings of
                                                              the ACL.
5   Conclusion
                                                            Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
Most of the dictionary-based algorithms for sen-              2002. Thumbs up?: Sentiment classification using
                                                              machine learning techniques. In Proceedings of the
timent analysis consider word frequency in doc-               ACL-02 Conference on Empirical Methods in Natu-
uments. However, this research has shown that                 ral Language Processing - Volume 10, EMNLP ’02,
collected corpus words with low frequencies can               pages 79–86, Stroudsburg, PA, USA. Association
be useful to set polarities. Thus, We propose a               for Computational Linguistics.
dictionary-based algorithm for sentiment analysis           Mathieu Roche and Violaine Prince, 2007. Model-
that uses AcroDefM I3 and TrueSkill methods so               ing and Using Context: 6th International and Inter-
as to compute correlation word scores that allow             disciplinary Conference, CONTEXT 2007, Roskilde,
                                                             Denmark, August 20-24, 2007. Proceedings, chapter
us to differentiate between positive and negative            AcroDef: A Quality Measure for Discriminating Ex-
polarities. This is particularly useful for low fre-         pansions of Ambiguous Acronyms, pages 411–424.
quency words obtained from corpus. In addition,              Springer Berlin Heidelberg, Berlin, Heidelberg.


                                                       87
Peter D. Turney. 2002. Thumbs up or thumbs down?:
  Semantic orientation applied to unsupervised classi-
  fication of reviews. In Proceedings of the 40th An-
  nual Meeting on Association for Computational Lin-
  guistics, ACL ’02, pages 417–424, Stroudsburg, PA,
  USA. Association for Computational Linguistics.
R. Varghese and M. Jayasree. 2013. Aspect based sen-
   timent analysis using support vector machine clas-
   sifier. In Advances in Computing, Communications
   and Informatics (ICACCI), 2013 International Con-
   ference on, pages 1581–1586, Aug.
Guangwei Wang and Kenji Araki. 2007. Modifying
  so-pmi for japanese weblog opinion mining by using
  a balancing factor and detecting neutral expressions.
  In Human Language Technologies 2007: The Con-
  ference of the North American Chapter of the As-
  sociation for Computational Linguistics; Compan-
  ion Volume, Short Papers, NAACL-Short ’07, pages
  189–192, Stroudsburg, PA, USA. Association for
  Computational Linguistics.


                                                          88

</pre>