<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>JACERONG at TASS 2016: An Ensemble Classi er for Sentiment Analysis of Spanish Tweets at Global Level</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jhon Adrian Ceron-Guzman Santiago de Cali</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valle del Cauca</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Colombia jadrian.ceron@gmail.com</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>35</fpage>
      <lpage>39</lpage>
      <abstract>
        <p>This paper describes an ensemble-based approach developed to participate in TASS-2016 Task 1 on sentiment analysis of Spanish tweets at global level. Ensembles are built on the combination of systems with the lowest absolute correlation with each other. The systems are able to deal with non-standard lexical forms in tweets, in order to improve the quality of natural language analysis. To support the polarity classi cation, the approach uses basic features that have proved their discriminative power, as well as word and character n-gram features. Then, outputs from Logistic Regression classi ers, which may be either class labels or probabilities for each class, are used to build ensembles. Experimental results show that the less-correlated combination of 25 systems, which chooses the class with the highest unweighted average probability, is the setting that best suits to the task, achieving an overall accuracy of 62.0% in the six-labels evaluation, and of 70.5% in the fourlabels evaluation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        What people say on social media about
issues of their everyday life, the society, and
the world in general, has turned into a rich
source of information to understand social
behavior. Twitter content, in particular,
has caught the attention of researchers who
have investigated its potential for conducting
studies on the human subjectivity at large
scale, which was not feasible using
traditional methods. Around election time,
sentiment analysis of political tweets has been
widely used to capture trends in public
opinion regarding important issues such as
voting intention
        <xref ref-type="bibr" rid="ref3">(Gayo-Avello, 2013)</xref>
        . However,
analyzing this content also presents several
challenges, including the development of text
analysis approaches based on Natural
Language Processing techniques, which properly
adapt to the informal genre and the free
writing style of Twitter
        <xref ref-type="bibr" rid="ref1 ref4">(Han and Baldwin, 2011;
Ceron-Guzman and Leon-Guzman, 2016)</xref>
        .
      </p>
      <p>
        TASS is a workshop aimed at fostering
research on sentiment analysis of Spanish
Twitter data, which provides a benchmark
evaluation to compare the latest advances in the
eld
        <xref ref-type="bibr" rid="ref1 ref2">(Garc a-Cumbreras et al., 2016)</xref>
        . One of
the proposed tasks is to determine the
opinion orientation expressed in tweets at global
level. Task 1 consists on assigning one of
six labels (P+, P, NEU, N, N+, NONE) to
a tweet in the six-labels evaluation; or one
of four labels (P, NEU, N, NONE) in the
four-labels evaluation. Here, P, N, and NEU,
stand for positive, negative, and neutral,
respectively; NONE, instead, means no
sentiment. The \+" symbol is used as intensi er.
      </p>
      <p>This paper presents an ensemble-based
approach to polarity classi cation of
Spanish tweets, developed to participate in Task 1
proposed by the organizing committee of the
TASS workshop. The ensemble members are
(relatively) highly correct classi ers with the
lowest absolute correlation with each other.
The output from each classi er, which may
be either a class label or probabilities for each
class, is used to assign the polarity to a tweet
based on a majority rule or on the highest
unweighted average probability. Moreover,
classi ers are adapted to deal with non-standard
lexical forms in tweets, in order to improve
the quality of natural language analysis.</p>
      <p>The remainder of this paper is organized
as follows. Section 2 describes the
common architecture of the ensemble members
(i.e., classi ers). Next, the submitted
experiments, as well as the obtained results, are
discussed in Section 3. Finally, Section 4
concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The System Architecture</title>
      <p>The tweet text is passed through the pipeline
of each system in order to assign it a class
label or a probability to be of a certain class.
The pipeline, which goes from text
preprocessing to machine learning classi cation, is
described below. Note that the system term
is preferred over the classi er term, because a
machine learning classi er receives a feature
vector and produces a class label or
probabilities for each class; instead, the system term
enables to conceive the whole process, from
preprocessing to machine learning classi
cation.
2.1</p>
      <sec id="sec-2-1">
        <title>Preprocessing</title>
        <p>The process of text cleaning and
normalization is performed in two phases: basic
preprocessing and advanced preprocessing.</p>
        <sec id="sec-2-1-1">
          <title>2.1.1 Basic Preprocessing</title>
          <p>The following simple rules are implemented
as regular expressions:</p>
          <p>Removing URLs and emails.</p>
          <p>HTML entities are mapped to textual
representations (e.g., \&amp;lt;" ! \&lt;").
Speci c Twitter terms such as mentions
(@user) and hashtags (#topic) are
replaced by placeholders.</p>
          <p>Unknown characters are mapped to their
closest ASCII variant, using the Python
Unidecode module for the mapping.
Consecutive repetitions of a same
character are reduced to one occurrence.
Emoticons are recognized and then
classi ed into positive and negative,
according to the sentiment they convey
(e.g., \:)" ! \EMO POS", \:(" !
\EMO NEG").</p>
          <p>
            Uni cation of punctuation marks
            <xref ref-type="bibr" rid="ref11">(Vilares, Alonso, and Gomez-Rodr guez,
2014)</xref>
            .
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2 Advanced Preprocessing</title>
          <p>
            Once the set of simple rules has been applied,
the tweet text is tokenized and
morphologically analyzed by FreeLing
            <xref ref-type="bibr" rid="ref7">(Padro and
Stanilovsky, 2012)</xref>
            . In this way, for each
resulting token, its lemma and Part-of-Speech
(POS) tag are assigned. Taking these data
as input, the following advanced
preprocessing is applied:
          </p>
          <p>
            Lexical normalization. Each token is
passed through a set of basic modules
of FreeLing (e.g., dictionary lookup,
sufxes check, detection of numbers and
dates, and named entity recognition)
for identifying standard word forms and
other valid constructions. If a token
is not recognized by any of the
modules, it is marked as out-of-vocabulary
(OOV) word. Then, a confusion set
is formed by normalization candidates
which are identical or similar to the
graphemes or phonemes that make the
OOV word. These candidates are
elements of the union of a dictionary
of Spanish standard word forms and a
gazetteer of proper nouns. The best
normalization candidate for the OOV word
is which best ts a statistical language
model. The language model was
estimated from the Spanish Wikipedia
corpus. Lastly, the selected candidate is
capitalized according to the
capitalization rules of the Spanish language.
Extensive research on lexical normalization
of Spanish tweets can be read in
            <xref ref-type="bibr" rid="ref1">(CeronGuzman and Leon-Guzman, 2016)</xref>
            .
          </p>
        </sec>
        <sec id="sec-2-1-3">
          <title>Negation handling. Inspired by the</title>
          <p>
            approach proposed by Pang et al.
            <xref ref-type="bibr" rid="ref8">(Pang,
Lee, and Vaithyanathan, 2002)</xref>
            , this
research de ned a negated context as a
segment of the tweet that starts with a
(Spanish) negation word and ends with
a punctuation mark (i.e., \!", \,", \:",
\?", \.", \;"), but only the rst n [0; 3]
or all tokens labeled with any or a
speci c POS tag (i.e., verb, adjective,
adverb, and common noun) are a ected by
adding it the \ NEG" su x. Note that
when n = 0, no token is a ected.
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Feature Extraction</title>
        <p>In this stage, the normalized tweet text is
transformed into a feature vector that feeds
the machine learning classi er. The features
are grouped into basic features and n-gram
features.</p>
        <sec id="sec-2-2-1">
          <title>2.2.1 Basic Features</title>
          <p>Some of these features are computed before
the process of text cleaning and
normalization is performed.</p>
          <p>The number of words completely in
uppercase.</p>
          <p>The number of words with more than
two consecutive repetitions of a same
character.</p>
          <p>The number of consecutive repetitions of
exclamation marks, question marks, and
both punctuation marks (e.g., \!!", \??",
\?!") and whether the text ends with an
exclamation or question mark.</p>
          <p>The number of occurrences of each class
of emoticons (i.e., positive and negative)
and whether the last token of the tweet
is an emoticon.</p>
          <p>
            The number of positive and negative
words, relative to the ElhPolar lexicon
            <xref ref-type="bibr" rid="ref10">(Saralegi and Vicente, 2013)</xref>
            , the AFINN
lexicon
            <xref ref-type="bibr" rid="ref6">(Nielsen, 2011)</xref>
            , or an union of
both lexicons. In a negated context, the
label of a polarity word is inverted (i.e.,
positive words become negative words,
and vice versa). Additionally, a third
feature labels the tweet with the class
whose number of polarity words in the
text is the highest.
          </p>
          <p>The number of negated contexts.</p>
          <p>The number of occurrences of each
Partof-Speech tag.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2 N-gram Features</title>
          <p>
            The xed-length set of basic features is
always extracted from tweets. However, the
tweet text varies from another in terms of
length, number of tokens, and vocabulary
used. For that reason, a process that
transforms textual data into numerical feature
vectors of xed length is required. This process,
known as vectorization, is performed by
applying the tf-idf weighting scheme
            <xref ref-type="bibr" rid="ref5">(Manning,
Raghavan, and Schutze, 2008)</xref>
            . Thus, each
document (i.e., a tweet text) is represented
as a vector d = ft1; : : : ; tng RV , where V
is the size of the vocabulary that was built
by considering word n-grams with n [1; 4],
or character n-grams with n [3; 5] in the
collection (i.e., the training set). The vector
is, hence, formed by word n-grams,
character n-grams, or a concatenation of word and
character n-grams.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3 Machine Learning</title>
      </sec>
      <sec id="sec-2-4">
        <title>Classi cation</title>
        <p>
          At the last stage, the sentiment analysis
system classi es a given tweet as either P+, P,
NEU, N, N+, or NONE, or assigns
probabilities for each class. After receiving as input
the feature vector, a L2-regularized Logistic
Regression classi er assigns a class label to
the tweet or a probability to be of a certain
class. The classi er was trained on the
training set, using the Scikit-learn
          <xref ref-type="bibr" rid="ref9">(Pedregosa et
al., 2011)</xref>
          implementation of the Logistic
Regression algorithm.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>1,720 di erent sentiment analysis systems
were trained on the training set via 5-fold
cross validation, in order to nd the best
parameter settings, namely: negation handling,
Experiment Accuracy
run-1
run-2
run-3
run-1
run-2
run-3</p>
      <sec id="sec-3-1">
        <title>Class</title>
        <p>
          P
NEU
N
NONE
polarity lexicon, order of word and
character n-grams, and others parameters related
to the vectorization process (e.g.,
lowercasing, frequency thresholds, etc.). The systems
were sorted by their mean cross-validation
score, and thus the top 50 ranked were
ltered to build the ensemble. The training
set is a collection of 7,219 tweets, each of
which is tagged with one of six labels (i.e.,
P+, P, NEU, N, N+, and NONE). Note that
the systems were trained for the six-labels
evaluation, and therefore the P+ and P
labels were merged into P, as well as the N+
and N labels were merged into N, to produce
an output in accordance with the four-labels
evaluation. Further description of the
provided corpus, as well as of the training and
test sets, can be read in
          <xref ref-type="bibr" rid="ref1 ref2">(Garc a-Cumbreras
et al., 2016)</xref>
          .
        </p>
        <p>Next, the top 50 systems assigned a class
label to each tweet in a collection of 1,000,
which was drawn from the untagged test set
with a similar class distribution to the
training set. In this stage, the objective was
to nd the systems with the lowest
absolute correlation with each other; therefore,
the performance was not evaluated. Then,
the less-correlated combinations of 5, 10, and
25 systems, were used to build the
ensembles, whose outputs correspond to the
submitted experiments. These experiments are
described below:
run-1: the less-correlated combination
of 5 systems, which chooses the class
label that represents the majority in the
predictions made by the ensemble
members.
run-2: the less-correlated combination
of 10 systems, which chooses the class
with the highest unweighted average
probability.
run-3: the less-correlated combination
of 25 systems, which chooses the class
with the highest unweighted average
probability.</p>
        <p>Tables 1 and 2 show the performance
evaluation on the test set (i.e., a collection of
60,798 tweets) for six and four labels,
respectively. Accuracy has been de ned as the o
cial metric for ranking the systems. In
summary, the main gain occurs among the
\run1" and \run-2" experiments, with an
increment of 0.5% in accuracy in the six-labels
evaluation, and of 0.2% in the four-labels
evaluation; instead, a negligible gain occurs
among the \run-2" and\ run-3" experiments,
taking additionally into account the
computational cost of running the latter.</p>
        <p>As a nal point, Table 3 shows how the
overall performance is a ected by the low
discriminative power of the ensembles (in this
case, the one that correspond to \run-3") for
the NEU class. With this in mind, it is
proposed as future work to deal with the low
representativeness of the NEU class in the
training data (i.e., 9.28% of tweets), in order
to properly characterize this kind of tweets.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This paper has described an ensemble-based
approach for sentiment analysis of Spanish
Twitter data at global level, developed in
order to participate in Task 1 proposed by
the organization of TASS workshop. Three
ensembles were built on the combination of
sentiment analysis systems with the lowest
absolute correlation with each other. The
systems were adapted to the informal genre
and the free writing style that characterize
Twitter, in order to improve the quality of
natural language analysis. In this way, the
predicted class label for a particular tweet
was based on a majority rule or on the
highest average probability. Experimental results
showed that the less-correlated combination
of 25 systems, which chose the class with the
highest unweighted average probability, was
the setting that best suited to the task.
However, there is a great room for improvement
in the learning of a proper characterization
of neutral tweets.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Ceron-Guzman</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Leon-Guzman</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Lexical normalization of Spanish tweets</article-title>
          .
          <source>In Proceedings of the 25th International Conference Companion on World Wide Web, WWW'16 Companion</source>
          , pages
          <volume>605</volume>
          {
          <fpage>610</fpage>
          . International World Wide Web Conferences Steering Committee.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Garc</surname>
            a-Cumbreras,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Villena-Roman</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Mart nez-</article-title>
          <string-name>
            <surname>Camara</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          <article-title>D az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          <string-name>
            <surname>Mart</surname>
            n-Valdivia, and
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>UrenaLopez</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of tass 2016</article-title>
          .
          <source>In Proceedings of TASS</source>
          <year>2016</year>
          :
          <article-title>Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN</article-title>
          <year>2016</year>
          ), Salamanca, Spain, September.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Gayo-Avello</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>A meta-analysis of state-of-the-art electoral prediction from Twitter data</article-title>
          .
          <source>Soc. Sci. Comput</source>
          . Rev.,
          <volume>31</volume>
          (
          <issue>6</issue>
          ):
          <volume>649</volume>
          {
          <fpage>679</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>B</given-names>
          </string-name>
          . and
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Lexical normalisation of short text messages: Makn sens a #Twitter. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1</article-title>
          , HLT'
          <volume>11</volume>
          , pages
          <fpage>368</fpage>
          {
          <fpage>378</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          , and H. Schutze.
          <year>2008</year>
          .
          <article-title>Scoring, term weighting and the vector space model</article-title>
          .
          <source>In An Introduction to Information Retrieval</source>
          . Cambridge University Press, New York, NY, USA.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Nielsen</surname>
            ,
            <given-names>F. A.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>A new anew: evaluation of a word list for sentiment analysis in microblogs</article-title>
          .
          <source>In Proceedings of the ESWC2011 Workshop on `Making Sense of Microposts': Big things come in small packages</source>
          , pages
          <volume>93</volume>
          {
          <fpage>98</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Padro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Stanilovsky</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Freeling 3.0: Towards wider multilinguality</article-title>
          .
          <source>In Proceedings of the Language Resources and Evaluation Conference (LREC</source>
          <year>2012</year>
          ), Istanbul, Turkey, May. ELRA.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaithyanathan</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Thumbs up?: Sentiment classi - cation using machine learning techniques</article-title>
          .
          <source>In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing -</source>
          Volume
          <volume>10</volume>
          , EMNLP '
          <volume>02</volume>
          , pages
          <fpage>79</fpage>
          {
          <fpage>86</fpage>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          {
          <fpage>2830</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Saralegi</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Vicente</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Elhuyar at tass 2013</article-title>
          .
          <source>In Proceedings of the Sentiment Analysis Workshop at SEPLN (TASS2013)</source>
          ,
          <year>September</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Vilares</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Alonso</surname>
          </string-name>
          , and
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>GomezRodr guez</article-title>
          .
          <year>2014</year>
          .
          <article-title>On the usefulness of lexical and syntactic processing in polarity classi cation of twitter messages</article-title>
          .
          <source>Journal of the Association for Information Science and Technology.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>