=Paper= {{Paper |id=Vol-1961/paper08 |storemode=property |title= Sentiment analysis for real-time applications |pdfUrl=https://ceur-ws.org/Vol-1961/paper08.pdf |volume=Vol-1961 |authors=Javi Fernández }} == Sentiment analysis for real-time applications == https://ceur-ws.org/Vol-1961/paper08.pdf
       Sentiment Analysis for Real-time Applications∗
    Análisis de sentimientos para aplicaciones en tiempo real
                                             Javi Fernández
                                            University of Alicante
                                                javifm@ua.es

       Abstract: In this paper we present a supervised hybrid approach for Sentiment
       Analysis in Real-time Applications. The main goal of this work is to design an ap-
       proach which employs very few resources but obtains near state-of-the-art results.
       Keywords: Sentiment Analysis, Real-time Applications, Lexicon, Machine Learn-
       ing
       Resumen: En este artı́culo presentamos una aproximación hı́brida supervisada para
       el análisis de sentimientos para aplicaciones en tiempo real. El objetivo principal de
       nuestro trabajo es diseñar una aproximación que emplee muy pocos recursos pero
       que obtenga resultados cercanos al estado de la cuestión.
       Palabras clave: Análisis de sentimientos, aplicaciones en tiempo real, lexicón,
       aprendizaje automático


1    Introduction                                          everybody. In addition, the quality of the ap-
                                                           proach should be near the state-of-the-art re-
Recent years have seen the birth of Social
                                                           sults. In the following sections we explain our
Networks and Web 2.0. They have facili-
                                                           approach in detail. Section 2 briefly describes
tated people to share aspects and opinions
                                                           the related work in the field and introduce
about their everyday life. This subjective in-
                                                           our work. In Section 3 we detail the approach
formation can be interesting for general users,
                                                           we propose. Finally, Section 4 concludes the
brands and organisations. However, the vast
                                                           paper, and outlines the future work.
amount of information (for example, over 500
million messages per day in Twitter1 ) compli-
                                                           2   Related Work
cates traditional sentiment analysis systems
to process this subjective information in real-            Two main approaches can be followed: ma-
time. The performance of sentiment analysis                chine learning and lexicon-based (Taboada et
tools has become increasingly critical.                    al., 2011; Medhat, Hassan, y Korashy, 2014;
   The main goal of our work is to design a                Mohammad, 2015; Ravi y Ravi, 2015). Ma-
sentiment analysis approach oriented to real-              chine learning approaches treat polarity clas-
time applications. An approach that bal-                   sification as a text categorisation problem.
ances efficiency and quality. It must employ               Texts are usually represented as vectors of
very few resources, in order to be able to pro-            features, and depending on the features used,
cess as many texts as possible. This will also             the system can reach better results. If a la-
make sentiment analysis more accessible for                belled training set of documents is needed,
                                                           the approach is defined as supervised learn-
∗
  This research work has been partially funded             ing; if not, it is defined as unsupervised learn-
by the University of Alicante, Generalitat Valen-          ing. These approaches perform very well in
ciana, Spanish Government, Ministerio de Edu-              the domain they are trained on, but their per-
cación, Cultura y Deporte and Ayudas Fundación
BBVA a equipos de investigación cientı́fica 2016
                                                           formance drops when the same classifier is
through the projects TIN2015-65100-R, TIN2015-             used in a different domain (Pang y Lee, 2008;
65136-C2-2-R, PROMETEOII/2014/001, GRE16-                  Tan et al., 2009). In addition, if the number
01: Plataforma inteligente para recuperación, análisis   of features is big, the efficiency drops dramat-
y representación de la información generada por          ically. Lexicon-based approaches make use of
usuarios en Internet and Análisis de Sentimientos
Aplicado a la Prevención del Suicidio en las Redes        dictionaries of opinionated words and phrases
Sociales (ASAP ).                                          to discern the polarity of a text. In these ap-
   1
     www.internetlivestats.com/twitter-statistics          proaches, each word in the dictionary is as-
signed a score for each sentiment (e.g. pos-      the dataset and the generated lexicon with
itivity and negativity). To detect the po-        machine learning techniques. Its architecture
larity of a text, the scores of its words are     can be seen in Figure 1. In the following sub-
combined, and the polarity with the great-        sections we explain the different parts of our
est score is chosen. These dictionaries can be    approach in detail.
generated manually (Tong, 2001), semiauto-
matically from an initial seed of opinionated                              Dataset
words (Kim, Rey, y Hovy, 2004; Baccianella,
Esuli, y Sebastiani, 2010), or automatically
                                                                         Tokenisation
from a labelled dataset (Jijkoun, de Rijke, y
Weerkamp, 2010; Cruz et al., 2013). The ma-
jor disadvantage of these approaches is the in-
capability to find opinion words with domain        Lexicon Generation                  Supervised Learning
and context specific orientations, while the
last one helps to solve this problem (Medhat,              Lexicon                           Classifier
Hassan, y Korashy, 2014). These approaches
are usually faster than machine learning ones,
as the combination of scores is normally a                 Figure 1: Approach architecture
predefined mathematical function. We de-
cided to use a hybrid approach, trying to         3.1      Tokenisation
take advantage of the machine learning ap-
proach categorisation quality and the lexicon     We tried to employ the minimum number of
approach speed.                                   external linguistic tools, to minimise the pos-
    Most of the current sentiment analysis ap-    sible propagation of external errors, in addi-
proaches employ words, n-grams and phrases        tion to the extra time they can consume. The
as information units for their models, either     tokenisation process starts obtaining all the
as features for machine learning approaches,      words in the text. We only extract words
or as dictionary entries in the lexicon-based     containing alphabetic characters. Numbers,
approaches. However, words and n-grams            punctuation symbols, or emoticons, are not
have some problems to represent the flexi-        considered at this moment, but we are study-
bility and sequentiality of human language.       ing the best way to include them in the fu-
This is the reason why we decided to use          ture. The only external resource we employ
skipgrams. The use of skipgrams is a tech-        for the tokenisation process is a stemmer to
nique whereby n-grams are formed (bigrams,        obtain the most general form of the words
trigrams, etc.), but in addition to using adja-   we extracted. We preferred a stemmer over
cent sequences of words, it also allows some      a lemmatiser because they are much faster
words to be skipped (Guthrie et al., 2006). In    (Balakrishnan y Lloyd-Yemoh, 2014) and re-
this way, skipgrams are new terms that retain     quire less resources, one of the goals of our
part of the sequentiality of the terms, but in    approach. Specifically, we used the Snowball 2
a more flexible way than n-grams (Fernández      implementation for each language.
et al., 2014). Note that an n-gram can be de-        Once we have the words in the text, we
fined as a 0-skip-n-gram, a skipgram where        combine them using the skipgram modelling
k = 0. For example, the sentence “I love          to obtain multiword terms. We will use two
healthy food” has two word level trigrams: “I     variables in this work: n will be the maxi-
love healthy” and “love healthy food”. How-       mum number of words when building a new
ever, there is one important trigram implied      term with the skipgram modelling, and k will
by the sentence that was not captured: “I         be the maximum number of skips. Note that
love food”. The use of skipgrams allows the       n = 3 includes all the terms with 1, 2 and 3
word “health” to be skipped, providing the        words, and k = 3 includes 1, 2 and 3 skips.
mentioned trigram.                                3.2      Lexicon generation
3   Methodology                                   In summary, our sentiment lexicon consists
                                                  of a list of terms for each polarity, assigning
Our contribution consists in a hybrid ap-         a score indicating how strongly that term is
proach which creates a lexicon from a labelled
                                                    2
dataset and builds a polarity classifier from           snowball.tartarus.org
related to that polarity. To build this lexicon,                         Negative               Score
we need a polarity labelled dataset, which                               this mess               .871
will provide both the terms in the lexicon and                           worst movie             .863
their scores. There exist many term scoring                              is terrible             .852
techniques (Yang y Pedersen, 1997; Chan-                                 ludicrous               .833
drashekar y Sahin, 2014), and the majority                               waste                   .818
of them employ probabilities to calculate the
scores. However, they take full advantage of                             Positive               Score
the skipgram modelling, because they give                                outstanding             .862
the same importance to terms where words                                 is terrific             .826
were adjacent, than to those where the words                             finest                  .823
were not adjacent (we skipped some of them).                             breathtaking            .803
Because of this, we created our custom scor-                             is excellent            .795
ing formula.
    First, we will describe our counting formu-           Table 1: Best five terms of the dictionary
las. In general, when we want to count the                built using the Movie Reviews dataset.
number of documents the term t occurs, we
usually loop over the dataset and add 1 each
time we find that term in a document. In-                 3.3      Supervised learning
stead, we add a value that is inversely propor-           We use machine learning techniques to create
tional to the number of skips. This is what               a model able to classify the polarity of new
formulas in Equations 1 and 2 do, where D                 texts. The documents in the dataset are em-
is the labelled dataset; |D| is the number of             ployed as training instances, and the labelled
documents in D, d is a document in D, Dp is               polarities are used as categories. However, in
the subset of documents in D labelled with                contrast with text classification approaches,
polarity p, |t| is the number of words in term            we do not create one feature per term, we cre-
t, and σ(t, d) is the number of skips of term             ate a feature per polarity. In other words, we
t in document d.                                          have the same number of features and cate-
                                                          gories. Our hypothesis is that this number
                                                          of features is enough to obtain a decent sys-
                    X                    |t|
       C(t) =             [t ∈ d]                   (1)   tem quality with a low latency. The weight
                    d∈D
                                    |t| + σ(t, d)         of each feature is calculated as specified in
                    X                     |t|             Equation 4, where w(d, p) is the weight of
     C(t, p) =             [t ∈ d]                 (2)    the feature for polarity p in document d.
                    d∈Dp
                                     |t| + σ(t, d)

   With this counting formulas, the num-                                      X                      |t|
                                                                  w(d, p) =         s(t, p) ·                   (4)
ber of skips is taken into account, and we                                    t∈d
                                                                                                |t| + σ(t, d)
can build our final scoring formula shown in
Equation 3, where s(t, p) is the score of term               Table 2 shows an example of feature
t for the polarity p, and θ is a factor that              weighting for the text “worst movie ever” us-
gives more relevance to terms that appear a               ing again the scores of a dictionary built using
largest number of times. This factor depends              the Movie Reviews dataset, with n = 2 and
on the size and the domain of the dataset.                k = 10. The final weights (positive = 1.48,
                                                          negative = 3.40) will be employed as feature
                    C(t, p)   C(t, p)                     weights for the machine learning process.
        s(t, p) =           ·                       (3)
                     C(t) C(t, p) + θ                        To build our model we employed Sup-
                                                          port Vector Machines (SVM), as it has been
    At the end of this process we have a list of          proved to be effective on text categorisation
skipgrams with a score for each polarity: our             tasks (Sebastiani, 2002; Mohammad, Kir-
sentiment lexicon. Table 1 shows an example               itchenko, y Zhu, 2013). Specifically, we used
of a dictionary built using the Movie Reviews             the Weka 3 (Hall et al., 2009) default imple-
dataset (Pang, Lee, y Vaithyanathan, 2002),               mentation with the default parameters (lin-
with n = 2 and k = 10. In this example, we                ear kernel, C = 1,  = 0.1).
show only the best five terms for each polar-
                                                            3
ity.                                                            www.cs.waikato.ac.nz/ml/weka
                         Positive     Negative         of the skipgrams. We also want to add more
        worst           0.00 · 1.00   0.79 · 1.00      features to the machine learning algorithm,
        movie           0.48 · 1.00   0.51 · 1.00      but always trying to maintain a small num-
        ever            0.52 · 1.00   0.45 · 1.00      ber of them, in order to avoid increasing the
        worst movie     0.00 · 1.00   0.86 · 1.00      latency. In addition, we want to include ex-
        worst ever      0.00 · 1.00   0.59 · 0.67      ternal resources and tools, such as knowledge
        movie ever      0.47 · 1.00   0.37 · 1.00      from existing sentiment lexicons, but always
        weight(w)          1.48          3.40          focused in real-time applications. We will
                                                       also extend our study to different corpora and
                                                       domains, to confirm the robustness of the ap-
Table 2: Example of features weights for the           proach.
sentence “worst movie ever” using the scores
of a dictionary built using the Movie Reviews          References
dataset with n = 2 and k = 10
                                                       Baccianella, S., A. Esuli, y F. Sebastiani.
4        Discussion                                      2010. Sentiwordnet 3.0: An enhanced lex-
                                                         ical resource for sentiment analysis and
In this paper we presented a supervised
                                                         opinion mining. En LREC, volumen 10,
hybrid approach for Sentiment Analysis in
                                                         páginas 2200–2204.
Twitter. We built a sentiment lexicon from
a polarity dataset using statistical measures.         Balakrishnan, V. y E. Lloyd-Yemoh. 2014.
We employed skipgrams as information units,              Stemming and lemmatization: a compar-
to enrich the sentiment lexicon with combina-            ison of retrieval performances. Lecture
tions of words that do not appear explicitly             Notes on Software Engineering, 2(3):262.
in the text. The lexicon created was used
in conjunction with machine learning tech-             Chandrashekar, G. y F. Sahin. 2014. A sur-
niques to create a polarity classifier.                  vey on feature selection methods. Com-
    Preliminary performance experiments                  puters & Electrical Engineering, 40(1):16–
have shown an acceptable speed to be em-                 28.
ployed in real-time applications4 . Processing         Cruz, F. L., J. A. Troyano, F. Enrı́quez, F. J.
speeds go from 1, 000 documents per second               Ortega, y C. G. Vallejo. 2013. Long au-
in the worst cases (long texts, great values             tonomy or long delay? the importance of
for n and k) to 10, 000 in the best cases                domain in opinion mining. Expert Systems
(short texts, low values for n and k). These             with Applications, 40(8):3174–3184.
numbers are good enough to work with
extensively used platforms like Twitter,               Fernández, J., J. M. Gómez, y P. Martı́nez-
where users generate over 500 million tweets              Barco. 2014. A supervised approach for
per day (this is almost 6,000 tweets per                  sentiment analysis using skipgrams. En
second)5 .                                                11th International Workshop on Natural
    Moreover, experiments with different                  Language Processing and Cognitive Sci-
datasets have also obtained promising re-                 ence (NAACL).
sults (Fernández et al., 2013; Fernández,
Gómez, y Martı́nez-Barco, 2014; Fernández            Fernández, J., Y. Gutiérrez, J. M. Gómez, y
et al., 2014; Gutierrez, Tomas, y Fernan-                 P. Martınez-Barco. 2014. Gplsi: Super-
dez, 2015; Fernández et al., 2015). Ex-                  vised sentiment analysis in twitter using
periments with the Movie Reviews dataset                  skipgrams. En Proceedings of the 8th In-
(Pang, Lee, y Vaithyanathan, 2002) obtained               ternational Workshop on Semantic Eval-
an accuracy of 86.7%, with long texts in En-              uation (SemEval 2014), páginas 294–299.
glish and 2-level polarity, and 64.7% with             Fernández, J., Y. Gutiérrez, J. M. Gómez,
the TASS 2012 dataset (Villena-Román y                   P. Martı́nez-Barco, A. Montoyo, y
Garcı́a-Morera, 2013) for Spanish tweets and              R. Munoz. 2013. Sentiment analysis of
6-level polarity.                                         spanish tweets using a ranking algorithm
    As future work, we plan to study new                  and skipgrams. En XXIX Congreso de la
methods to calculate and combine the weight               Sociedad Espanola de Procesamiento de
    4
        Using a Macbook Pro 2.4 GHz i5 with 8GB RAM       Lenguaje Natural (SEPLN 2013), páginas
    5
        www.internetlivestats.com/twitter-statistics      133–142.
Fernández, J., Y. Gutiérrez, J. M. Gómez, y       Tweets. En Proceedings of the Interna-
   P. Martı́nez-Barco. 2014. GPLSI: Super-           tional Workshop on Semantic Evaluation
   vised Sentiment Analysis in Twitter using         (SemEval-2013).
   Skipgrams. En Proceedings of the 8th In-
                                                  Pang, B. y L. Lee.       2008.  Opinion
   ternational Workshop on Semantic Eval-
                                                    Mining and Sentiment Analysis. Foun-
   uation (SemEval 2014), numero SemEval,
                                                    dations and Trends in Information Re-
   páginas 294–299.
                                                    trieval, 2(1–2):1–135.
Fernández, J., Y. Gutiérrez, D. Tomás, J. M.   Pang, B., L. Lee, y S. Vaithyanathan. 2002.
   Gómez, y P. Martı́nez-Barco. 2015. Eval-        Thumbs up? Sentiment Classification us-
   uating a sentiment analysis approach from        ing Machine Learning Techniques. En
   a business point of view.                        Conference on Empirical Methods in Nat-
Guthrie, D., B. Allison, W. Liu, L. Guthrie,        ural Language Processing (EMNLP 2002),
  y Y. Wilks. 2006. A Closer Look at                numero July, páginas 79–86.
  Skip-gram Modelling. En 5th interna-            Ravi, K. y V. Ravi. 2015. A survey on opin-
  tional Conference on Language Resources           ion mining and sentiment analysis: tasks,
  and Evaluation (LREC 2006), páginas 1–           approaches and applications. Knowledge-
  4.                                                Based Systems, 89:14–46.
Gutierrez, Y., D. Tomas, y J. Fernandez.          Sebastiani, F. 2002. Machine Learning in
  2015. Benefits of using ranking skip-gram          Automated Text Categorization. ACM
  techniques for opinion mining approaches.          Computing Surveys (CSUR), 34(1):1–47,
  En eChallenges e-2015 Conference, 2015,            3.
  páginas 1–10. IEEE.
                                                  Taboada, M., J. Brooke, M. Tofiloski,
Hall, M., E. Frank, G. Holmes, B. Pfahringer,       K. Voll, y M. Stede. 2011. Lexicon-based
  P. Reutemann, y I. H. Witten. 2009. The           methods for sentiment analysis. Compu-
  weka data mining software: an update.             tational Linguistics, 37(2):267–307.
  ACM SIGKDD explorations newsletter,             Tan, S., X. Cheng, Y. Wang, y H. Xu. 2009.
  11(1):10–18.                                      Adapting Naive Bayes to Domain Adapta-
Jijkoun, V., M. de Rijke, y W. Weerkamp.            tion for Sentiment Analysis. Advances in
   2010. Generating focused topic-specific          Information Retrieval, páginas 337–349.
   sentiment lexicons. En Proceedings of the      Tong, R. M. 2001. An operational sys-
   48th Annual Meeting of the Association           tem for detecting and tracking opinions in
   for Computational Linguistics, páginas          on-line discussion. En Working Notes of
   585–594. Association for Computational           the ACM SIGIR 2001 Workshop on Op-
   Linguistics.                                     erational Text Classification, volumen 1,
Kim, S.-m., M. Rey, y E. Hovy. 2004.                página 6.
  Determining the Sentiment of Opinions.          Villena-Román, J. y J. Garcı́a-Morera. 2013.
  En Proceedings of the 20th International           TASS 2013-Workshop on Sentiment Anal-
  Conference on Computational Linguistics            ysis at SEPLN 2013: An overview. En
  (COLING 2004), página 1367.                       XXIX Congreso de la Sociedad Española
Medhat, W., A. Hassan, y H. Korashy. 2014.           de Procesamiento de Lenguaje Natural
  Sentiment Analysis Algorithms and Appli-           (SEPLN 2013).
  cations: a Survey. Ain Shams Engineering        Yang, Y. y J. O. Pedersen. 1997. A compara-
  Journal.                                          tive study on feature selection in text cat-
                                                    egorization. En Icml, volumen 97, páginas
Mohammad, S. M. 2015. Sentiment analysis:
                                                    412–420.
  Detecting valence, emotions, and other af-
  fectual states from text. Emotion mea-
  surement, páginas 201–238.
Mohammad, S. M., S. Kiritchenko, y
  X. Zhu. 2013. NRC-Canada: Building the
  State-of-the-Art in Sentiment Analysis of