=Paper=
{{Paper
|id=Vol-1179/CLEF2013wn-RepLab-MosqueraEt2013
|storemode=property
|title=DLSI-Volvam at RepLab 2013: Polarity Classification on Twitter Data
|pdfUrl=https://ceur-ws.org/Vol-1179/CLEF2013wn-RepLab-MosqueraEt2013.pdf
|volume=Vol-1179
|dblpUrl=https://dblp.org/rec/conf/clef/LopezFGMM13
}}
==DLSI-Volvam at RepLab 2013: Polarity Classification on Twitter Data==
<pdf width="1500px">https://ceur-ws.org/Vol-1179/CLEF2013wn-RepLab-MosqueraEt2013.pdf</pdf>
<pre>
            DLSI-Volvam at RepLab 2013:
        Polarity Classification on Twitter Data

                    Alejandro Mosquera1,2 , Javi Fernández1 ,
       José M. Gómez1 , Patricio Martı́nez-Barco1 , and Paloma Moreda1
                1
                 Department of Software and Computing Systems,
                      University of Alicante, Alicante, Spain
                            http://www.dlsi.ua.es
                    2
                      Volvam Analytics Ltd., Dublin, Ireland
                            http://www.volvam.com
           {amosquera,javifm,jmgomez,patricio,paloma}@dlsi.ua.es


      Abstract. This paper describes our participation in the profiling (po-
      larity classification) task of the RepLab 2013 workshop. This task is fo-
      cused on determining whether a given text from Twitter contains a pos-
      itive or a negative statement related to the reputation of a given entity.
      We cover three different approaches, one unsupervised and two unsuper-
      vised. They combine machine learning and lexicon-based techniques with
      an emotional concept model. These approaches were properly adapted to
      English and Spanish depending on the resources available for each lan-
      guage. We obtained promising results in the overall evaluations, reaching
      a F-score of 34% and a sensitivity of 40% in the best cases. The reason-
      able level of performance compared to other methods encourages us to
      continue working on the improvement of the proposed approaches.

      Keywords: online reputation, sentiment analysis, polarity classifica-
      tion, text normalisation, machine learning, lexicon, emotion concepts


1   Introduction

Nowadays, social media applications have allowed users to have an active par-
ticipation through their comments and opinions, stated about a wide range of
topics and services. This subjective information is very valuable because it de-
termines the reputation of public figures and companies in the marketplace of
personal and business relationships. However, it is not feasible to monitor this in-
formation in a manual way, because the amount of information is very large and
is updated very quickly. Therefore, automatising this process is essential. The
field of on-line reputation management (ORM) studies automated ways to track
the opinion of the users about qualitative or quantitative aspects dealing with
several challenges such as subjectivity, textual noise or domain heterogeneity.
This task is very complex, as it deals with important issues in opinion mining,
sentiment analysis, bias detection, named entity discrimination, topic modelling
and other aspects which are not trivial in natural language processing [1].
2       Alejandro Mosquera et al.

     RepLab 2013 is a competitive evaluation exercise for ORM systems, focusing
in monitoring the reputation of entities (companies, organisations, celebrities,
etc.) on Twitter 3 [2]. In this article we focus our participation in the profiling
(polarity classification) task. The goal of this task is to decide if the tweet con-
tent has positive or negative implications for the reputation of a given entity.
Polarity for reputation is substantially different from standard sentiment analy-
sis, because the goal is to find what implications a piece of information regardless
of whether the content is opinionated or not. In addition, negative sentiments do
not always imply negative polarity for reputation and vice versa (e.g. "R.I.P.
Whitney Houston. We’ll miss you" has a negative associated sentiment but
a positive implication for the reputation of Whitney Houston).
     We propose three different approaches to face this task. Our first approach is
unsupervised and makes use of fuzzy lexicons in order to catch informal variants
that are common in Twitter texts. The second one is supervised and extends the
first approach with machine learning (ML) techniques and an emotion concept
model. Finally, the last one also employs ML but this time following the bag-
of-concepts (BoC) approach common-sense affective knowledge. Each approach
has been adapted properly to English and Spanish, depending on the resources
available for each language.
     The remainder of the paper is structured as follows. In Section 2, we describe
the approaches proposed, as well as the tools and resources used in the imple-
mentation. The experiments performed and their evaluation and discussion are
provided in Section 3. Finally, Section 4 concludes the paper, and outlines the
future work.


2     Polarity Classification
The following sections explain our three different approaches submitted to the
polarity classification subtask of RepLab 2013. We focus on the techniques, tools
and resources employed for the design and implementation of each approach.
Their main goal is to determine whether a tweet has positive, negative or neu-
tral impact on the reputation of a given entity. These approaches were properly
adapted each approach to English and Spanish but, as not all the required re-
sources are available for both languages, our adaptations are no symmetric.
    The preprocessing module, common to all our approaches, is explained in
Section 2.1. The first one is unsupervised and it is described in Section 2.2. In
Section 2.3 and Section 2.4 we explain the supervised approaches.

2.1    Preprocessing
Tweets are preprocessed before applying any model by following these common
steps, for both English and Spanish languages:

1) Cleansing. All the words with non-standard characters are removed.
3
    http://www.twitter.com
      DLSI-Volvam at RepLab 2013: Polarity Classification on Twitter Data       3

2) Tokenisation. The text is first split into sentences using regular expressions.
3) Lemmatisation. For each sentence we extract the lemmas of its words. In
   English texts this sentence extraction is made using the MBLEM4 lemma-
   tiser, that combines a memory-based ML algorithm with a dictionary lookup.
   Freeling5 [3] was the tool selected for extracting lemmas from sentences in
   Spanish. In order to obtain accurate lemmas a custom dictionary was created
   to replace common out-of-vocabulary (OOV) words, such as misspellings and
   informal lexical variants with their canonical version (e.g. lol → laugh; q →
   que).
4) URL removal. Each URL is substituted with a place-holder tag ( URL ).
5) Twitter hashtag splitting. Hashtags can contain sentiment-related information
   so we split them into independent words using a cost function based on word
   frequencies (e.g. #WeHateVF → we hate VF).
6) Emoticon normalisation. We follow the same approach found in [4] in order to
   replace emoticons with their textual equivalence (e.g. xDDD → I am happy).
7) Named-entity detection. Locations, people and temporal expressions are de-
   tected using a maximum entropy tagger, which was trained with the CONLL
   dataset [5].


2.2   Volvam Polarity 1: Unsupervised Lexicon-Based Model

Our first submitted run makes use of the fuzzy lexicons of SentiStrength 6 [6], in
order to detect the most common informal terminology used in Twitter. These
lexicons indicate not only if a term represents a positive or a negative opinion,
but also an intensity score. The terms in these lexicons are English terms, so
we manually translated them to obtain the corresponding Spanish lexicons. In
addition, we extended the lexicons to allow the detection of modifiers that can
invert (negation), increase or decrease the polarity score of each term. The po-
larity score of a text T is calculated by adding the lexicon scores of each term t
inside that text:
                               X
         polarityScore(T ) =         lexiconScore(t) ∗ modif iersScore(t)     (1)
                               t∈T

   where lexiconScore(t) is the polarity score for the term t (range [−4, 4]) and
modif iersScore(t) is the score given to the term t by the modifiers of term t
(range [−1, 1]). Finally, the polarity of that text is assigned depending on the
polarity score obtained, using the following formula:
                             
                              positive     if polarityScore(T ) > 0
              polarity(T ) = neutral        if polarityScore(T ) = 0
                               negative if polarityScore(T ) < 0
                             

4
  http://ilk.uvt.nl/mbma/
5
  http://nlp.lsi.upc.edu/freeling/
6
  http://sentistrength.wlv.ac.uk
4         Alejandro Mosquera et al.

2.3     Volvam Polarity 2: Supervised Model combining
        Lexicons and Concepts
Our second submitted run uses a supervised ML model. The features used for
this model are generated using the unsupervised model from Section 2.2:

    – TotalPolarity. Total polarity obtained from the unsupervised model.
    – AvgSubjectivity. Average subjectivity values extracted from the v2.0 polarity
      dataset [7].
    – CountPositive. Number of positive words in the text.
    – CountNegative. Number of negative words in the text.
    – CountNeutral. Number of neutral words in the text.
    – SentenceTokens. Tokens/sentence ratio.
    – TotalSubjectivity. Total subjectivity value,
    – CountSubjective. Number of words with subjectivity > 0.
    – CountProfanity. Number of profanity words.
    – CountQuestions. Number of sentences that are questions.
    – CountNonQuestions. Number of sentences that are not questions.
    – CountNegated. Number of negated sentences.
    – CountModPlus. Number of augmentative modifiers.
    – CountModMinus. Number of diminutive modifiers.

    In addition, for the English texts, we added emotion-based features from
SenticNet [8]. SenticNet consists on a lexicon containing four concept dimensions
for each term: pleasantness, attention, sensitivity and aptitude. These concepts
and its scores are used as additional features to build the ML model.
    As the training dataset provided for this task was highly unbalanced, in
terms of language and polarity labels, we followed a cross-corpus approach. As
training set for the English language we used the sentiment analysis training
dataset from SemEval 2013 [9] and, for the Spanish language, the TASS 2012
[10] training set. The classification model was built using the Random Forests
[11] ensemble classifier on a subset of 6000 tweets.

2.4     Volvam Polarity 3: Supervised Model using Bag-of-Concepts
In our last submission we created different models for each language. For En-
glish, a Random Forest classifier was built using concept count vectors extracted
from the provided RepLab training data. We followed the BoC approach using
SenticNet common-sense affective knowledge. As we did not find an equivalent
emotion-based model for Spanish, we followed a simpler bag-of-words approach
using the lemmas of the terms in the text.


3      Evaluation
Our system was evaluated in terms of accuracy and F(R, S)[12], where R (re-
liability) is the precision of relations predicted by the system with respect to
        DLSI-Volvam at RepLab 2013: Polarity Classification on Twitter Data       5

actual relations in the gold standard and S (sensitivity) is the recall of relations
predicted by the system with respect to the actual relations in the gold standard.
A comparative of the obtained results are detailed in Table 1 (only the best run
for each one of the other teams is displayed for informative purposes).


    Method                         Accuracy Reliability Sensitivity F(R, S)
    SZTE NLP polarity 6             0.685     0.465        0.345     0.381
    popstar polarity 5               0.638    0.433        0.339     0,373
    Daedalus polarity 3              0.438    0.312        0.397     0.341
    Volvam polarity 2                0.408    0.313        0.394     0.340
    Volvam polarity 1                0.389    0.302       0.402      0.336
    NLP IR GROUP UNED polarity 1     0.578    0.333        0.309     0.316
    lia polarity 5                   0.644    0.446        0.268     0.311
    UAMCLYR polarity 05              0.577    0.329        0.286     0.300
    replab2013 UNED ORM polarity 1   0.587    0.316        0.290     0.298
    Baseline                         0.584    0.315        0.289     0.297
    GAVKTH polarity 2                0.263    0.371        0.213     0.267
    Volvam polarity 3                0.537    0.315        0.225     0.255
    diue polarity 1                  0.546    0.333        0.215     0.254
    IE-Polarity-4                    0.513    0.279        0.222     0.212
    ALLPOSITIVE                      0.577       1           0         0

               Table 1. Polarity classification results at RepLab 2013.


    In general, the results obtained by all participants are not as high as the
state-of-the-art results in polarity classification. This happens because polarity
for reputation is a more complex task [1]. In addition, the datasets provided are
highly unbalanced so the accuracies are no significant [13, 14]. This fact can be
seen in the results of the trivial ALLPOSITIVE run, where all texts in the training
set were classified as positive, which achieves an accuracy of 57%.
    Our best ranked approach is the second one with a F-score of 34%, very near
to the 38% obtained by the best approach of all participants. Our first approach
reached the best sensitivity of all runs, with a 40%.

4     Conclusions
In this paper we described our participation in the profiling (polarity classi-
fication) task of the RepLab 2013 workshop. We covered three different ap-
proaches, one unsupervised and two unsupervised, combining machine learn-
ing and lexicon-based techniques with an emotional concept model. These ap-
proaches were properly adapted to English and Spanish depending on the re-
sources available for each language. We obtained promising results in the overall
evaluations, reaching a F-score of 34% and a sensitivity of 40% in the best cases.
The reasonable level of performance compared to other methods encourages us
to continue working on the improvement of the proposed approaches.
6       Alejandro Mosquera et al.

References
 1. Balahur, A.: The challenge of processing opinions in online contents in the social
    web era. In: Proceedings of the Language Engineering for Online Reputation
    Management Worksop, LREC 2012. (2012)
 2. Amig, E., Carrillo de Albornoz, J., Chugur, I., Corujo, A., Gonzalo, J., Martn,
    T., Meij, E., de Rijke, M., Spina, D.: Overview of replab 2013: Evaluating online
    reputation monitoring systems. In: Fourth International Conference of the CLEF
    initiative, CLEF 2013, Valencia, Spain. Proceedings. Springer LNCS (2013)
 3. Padr, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In Chair),
    N.C.C., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk,
    J., Piperidis, S., eds.: Proceedings of the Eight International Conference on Lan-
    guage Resources and Evaluation (LREC’12), Istanbul, Turkey, European Language
    Resources Association (ELRA) (may 2012)
 4. Mosquera, A., Lloret, E., Moreda, P.: Towards facilitating the accessibility of web
    2.0 texts through text normalisation. In: Proceedings of the LREC workshop:
    Natural Language Processing for Improving Textual Accessibility (NLP4ITA) ;
    Istanbul, Turkey. (2012) 9–14
 5. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task:
    Language-independent named entity recognition. In: Proceedings of the seventh
    conference on Natural language learning at HLT-NAACL 2003-Volume 4, Associ-
    ation for Computational Linguistics (2003) 142–147
 6. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength
    detection in short informal text. Journal of the American Society for Information
    Science and Technology 61(12) (2010) 2544–2558
 7. Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity
    summarization based on minimum cuts. In: Proceedings of the ACL. (2004)
 8. Cambria, E., Havasi, C., Hussain, A.: Senticnet 2: A semantic and affective resource
    for opinion mining and sentiment analysis. In Youngblood, G.M., McCarthy, P.M.,
    eds.: FLAIRS Conference, AAAI Press (2012)
 9. Wilson, T., Kozareva, Z., Nakov, P., Rosenthal, S., Stoyanov, V., Ritter, A.:
    Semeval-2013 task 2: Sentiment analysis in twitter. In: Proceedings of the In-
    ternational Workshop on Semantic Evaluation, SemEval. Volume 13. (2013)
10. Villena-Román, J., Lana-Serrano, S., Martı́nez-Cámara, E., Cristóbal, J.C.G.: Tass
    - workshop on sentiment analysis at sepln. Procesamiento del Lenguaje Natural
    50 (2013) 37–44
11. Breiman, L.: Random forests. Mach. Learn. 45(1) (October 2001) 5–32
12. Amigo, E., Gonzalo, J., Verdejo, F.: Reliability and sensitivity: Generic evaluation
    measures for document organization tasks. In: Tech. rep., UNED. (2012)
13. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceed-
    ings of the 22nd annual international ACM SIGIR conference on Research and
    development in information retrieval, ACM (1999) 42–49
14. Boldrini, E., Fernández Martı́nez, J., Gómez Soriano, J.M., Martı́nez Barco, P.,
    et al.: Machine learning techniques for automatic opinion detection in non-
    traditional textual genres. (2009)

</pre>