=Paper=
{{Paper
|id=Vol-1749/paper_027
|storemode=property
|title=Convolutional Neural Networks for Sentiment Analysis on Italian Tweets
|pdfUrl=https://ceur-ws.org/Vol-1749/paper_027.pdf
|volume=Vol-1749
|authors=Giuseppe Attardi,Daniele Sartiano,Chiara Alzetta,Federica Semplici
|dblpUrl=https://dblp.org/rec/conf/clic-it/AttardiSAS16
}}
==Convolutional Neural Networks for Sentiment Analysis on Italian Tweets==
<pdf width="1500px">https://ceur-ws.org/Vol-1749/paper_027.pdf</pdf>
<pre>
    Convolutional Neural Networks for Sentiment Analysis on Italian
                               Tweets

           Giuseppe Attardi, Daniele Sartiano, Chiara Alzetta, Federica Semplici
                               Dipartimento di Informatica
                                     Università di Pisa
                                  Largo B. Pontecorvo, 3
                                     I-56127 Pisa, Italy
                           {attardi, sartiano}@di.unipi.it,
                      {c.alzetta, f.semplici}@studenti.unipi.it


                                                             Task 1: Subjectivity Classification: identi-
                     Abstract
                                                              fy the subjectivity of a tweet.
    English. The paper describes our sub-                    Task 2: Polarity Classification: classify a
    mission to the task 2 of SENTIment PO-                    tweet as positive, negative, neutral or
    Larity Classification in Italian Tweets at                mixed (i.e. a tweet with positive and nega-
    Evalita 2016. Our approach is based on a                  tive sentiment).
    convolutional neural network that ex-
    ploits both word embeddings and Senti-                   Task 3: Irony Detection: identify if is pre-
    ment Specific word embeddings. We also                    sent the irony in a tweet.
    experimented a model trained with a dis-               The state of the art on the polarity classifica-
    tant supervised corpus. Our submission              tion of tweets is the application of Deep Learning
    with Sentiment Specific word embed-                 methods (Nakov et al., 2016), like convolutional
    dings achieved the first official score.            neural network or recurrent neural networks, in
                                                        particular long short-term memory networks
    Italiano. L’articolo descrive la nostra             (Hochreiter, and Schmidhuber, 1997).
    partecipazione al task 2 di SENTIment                  We explored Deep Learning techniques for the
    POLarity Classification in Italian Tweets           sentiment analysis of English tweets at Semeval
    a Evalita 2016. Il nostro approccio si ba-          2016 with good results, where we noticed that
    sa su una rete neurale convoluzionale               use of convolutional neural network and Senti-
    che sfrutta sia word embeddings tradi-              ment Specific word embeddings was promising.
    zionali che sentiment specific word em-                We applied a similar approach for the Italian
    beddings. Abbiamo inoltre sperimentato              language, building word embeddings from a big
    un modello allenato su un corpus costrui-           corpus of Italian tweets, sentiment specific word
    to mediante tecnica distant supervised. Il          embeddings from positive and negative tweets,
    nostro sistema, che utilizza Specific Sen-          using a convolutional neural network as classifi-
    timent word embeddings, ha ottenuto il              er. We also introduced a distant supervised cor-
    primo punteggio officiale.                          pus as silver training set.
                                                           We report the results of our experiments with
1    Introduction                                       this approach on the task Evalita 2016 Sentipolc
                                                        Task 2 Polarity classification.
The paper describes our submissions to the Task
2 of SENTiment POLarity Classification at
Evalita 2016 (Barbieri et al. 2016).                    2    Description of the System
   In Sentipolc the focus is the sentiment analysis
of the in Italian tweets, it is divided in three sub-   The architecture of the system consists of the
tasks:                                                  following steps:
           Oggi
            non
             mi
          sento
          molto
           bene
       EMO_SAD

                     embeddings            convolutional layer                                Multilayer percep-
                                                                         max over time
                    for each word                 with                                               tron
                                                                           pooling
                                             multiple filters                                  with dropout


                                      Figure 1. The Deep Learning classifier.

       build word embeddings from a collection
        of 167 million tweets collected with the           2.2       Word Embeddings and Sentiment Spe-
        Twitter API over a period of May to Sep-                     cific Word Embeddings
        tember 2016, preprocessed as described             We experimented with standard word embed-
        later.                                             dings, in particular building them with the tool
                                                           word2vec1 (Mikolov, 2013), using the skip-gram
       build Sentiment Specific word embed-
                                                           model. These word embeddings though do not
        dings using a portion of these tweets split
                                                           take into account semantic differences between
        into positive/negative by distant supervi-
                                                           words expressing opposite polarity, since they
        sion.
                                                           basically encode co-occurrence information as
       train a convolutional neural network clas-         shown by (Levy and Goldberg, 2014). For en-
        sifier using one of the above word embed-          codes sentiment information in the continuous
        dings                                              representation of words, we use the technique of
                                                           Tang et al. (2014) as implemented in the
The convolutional neural network classifier ex-
                                                           DeepNL 2 library (Attardi, 2015). A neural net-
ploits pre-trained word embeddings as only fea-
                                                           work with a suitable loss function provides the
tures in various configurations as described be-
                                                           supervision for transferring the sentiment polari-
low. The architecture of the classifier consists of
                                                           ty of texts into the embeddings from generic
the following layers described in Figure 1: a
                                                           tweets.
lookup layer for word embeddings, a convolu-
tional layer with a ReLU activation function, a            2.3       Distant supervision
maxpooling layer, a dropout layer, a linear layer
with tanh activation and a softmax layer. This is          The frequency distribution of classes in the da-
the same classifier described in (Attardi and Sar-         taset, as shown in Table 1, seems skewed and not
tiano, 2016), that achieved good results at the            fully representative of the distribution in a statis-
SemEval 2016 task 4 on Sentiment Analysis in               tical sample of tweets: negative tweets are nor-
Twitter (Nakov et al., 2016). Here we test it on a         mally much less frequent than positive or neutral
similar task for Italian tweets.                           ones (Bravo-Marquez, 2015). To reduce this bias
                                                           and to increase the size of the training set, we
2.1      Data Preprocessing                                selected additional tweets from our corpus of
                                                           Italian tweets by means distant supervision. In
In order to build the word embeddings we pre-
                                                           the first step we selected the tweets belonging to
processed the tweets using tools from the Tanl
                                                           a class (positive, negative, neutral, mixed) via
pipeline (Attardi et al., 2010): the sentence split-
                                                           regular expressions. In the second step, the se-
ter and the specialized tweet tokenizer for the
                                                           lected tweets are classified by the classifier
tokenization and the normalization of tweets.
                                                           trained using the task trainset. The silver corpus
Normalization involved replacing the mentions
                                                           is built taking the tweets with the matched class
with the string “@mention”, emoticons with their
                                                           between the regular expression system and the
name (e.g. “EMO_SMILE”) and URLs with
                                                           classifier.
“URL_NORM”.


                                                           1
                                                               https://code.google.com/archive/p/word2vec/
                                                           2
                                                               https://github.com/attardi/deepnl
3      Experiments
                                                             The training set is still fairly small, compared for
The plain word embeddings were built applying                example to the size of the corpus used in
vord2vec to a collection of 167 million Italian              SemEval 2106. The “mixed” class in particular is
unlabeled tweets, using the the skip gram model,             small in absolute numbers, even though not in
and the following parameters: embeddings size                percentage value, which makes hard to properly
300, window dimension 5, discarding word that                train a ML classifier.
appear less than 5 times. We obtained about                     Therefore we tried to increase the training set
450k word embeddings.                                        by means of the distant supervision as described
   The sentiment specific word embeddings                    above: we selected a maximum of 10,000 tweets
(SWE) were built with DeepNL, starting from                  for class via regular expressions, then we classi-
the word embeddings built at the previous step               fied them with the classifier trained with the gold
and tuning them with a supervised set of positive            training set. We chose for addition into a silver
or negative tweets, obtained as follows from 2.3             training set, the tweets which were assigned by
million tweets selected randomly from our cor-               the classifier the same class of the regular ex-
pus of collected tweets:                                     pression. As reported in Table 2, the silver dataset
    Positive tweet: one that contains only                  remains unbalanced; in particular, no “mixed”
       emoticons from a set of positive emoti-               example was added to the original train set.
       cons (e.g. smiles, hearts, laughs, expres-
       sions of surprise, angels and high fives).             Class                    Train set             Dev set
                                                              Neutral                     8505                 554
      Negative tweet: one that contains only                 Negative                    5987                 513
       emoticons from a set of negative emotions              Positive                    6813                 312
       (e.g. tears, angry and sad).                           Mixed                       337                  103
                                                                   Table 2. Distant supervised dataset distribution.
Integris srl cooperated to the task providing a set
of 1.3 million tweets, selected by relying on a              Table 3 shows the common settings used for train-
lexicon of handcrafted polarized words. This re-             ing the classifier. We used the same parameters
source is also added to the corpus.                          as SemEval-2016.
   We split the training set provided for the
Evalita 2016 SentiPolc Task into a train set                  Word Embeddings Size                     300
(5335 tweets), a validation set (592 tweets) and a            Hidden Units                             100
test set (1482 tweets). This dataset was tokenized            Dropout Rate                             0.5
and normalized as described in Section 2.1.                   Batch size                                50
                                                              Adadelta Decay                          0.95
   For the take of participating to subtask 2, po-                                                      50
                                                              Epochs
larity classification, the 13-value annotations                         Table 3. Network Common Settings
present in the datasets were converted into four
values: “neutral”, “positive”, “negative” and                We performed extensive experiments with the
“mixed” depending on the values of the fields                classifier in various configurations, varying the
“opos” and “oneg”, which express the tweet po-               number of filters; the use of skip-gram word em-
larity, according to the task guidelines3. We did            beddings or sentiment specific word embed-
not take into account the values for “lpos” and              dings; different training sets, either the gold one
“lneg”.                                                      or the silver one. Results of the evaluation on the
   The frequency distribution of these classes               validation set allowed us to choose the best set-
turns out to be quite unbalanced, as shown in                tings, as listed in the Table 4. Best Settings.
Table 1.
                                                                                  Run1                   Run2
    Class                 Train set         Validation set   Embeddings      WE skipgram                 SWE
    Neutral                 2262                  554        Training set    Gold Silver            Gold      Silver
    Negative                2029                  513          Filters       2,3,5 4,5,6,7 7,7,7,7,8,8,8,8 7,8,9,10
    Positive                1299                  312                           Table 4. Best Settings
    Mixed                    337                  103
               Table 1. Task dataset distribution            4    Results
                                                             We submitted four runs for the subtask 2 “polari-
3
 http://www.di.unito.it/~tutreeb/sentipolc-                  ty classification”:
evalita16/sentipolc-
guidelines2016UPDATED130916.pdf
    UniPI_1.c: gold training set, word embed-                 team1_2.u          0.6723         0.7979         0.7351
     dings with skip-gram model, filters:                      team2_.1.c         0.6555         0.7814         0.7184
     “2,3,5”.
                                                               team3_.2.c         0.6733         0.7535         0.7134
    UniPI_1.u: silver corpus as training set,                 team4_.c           0.6465          0.775         0.7107
     word embeddings with skip-gram model,
     filters: “4,5,6,7”.                                       team5_2.c          0.6671         0.7539         0.7105
                                                               team6_.c           0.6623          0.755         0.7086
    UniPI_2.c: gold training set, sentiment
     specific word embeddings, filters:                        team1_.c           0.6499          0.759         0.7044
     “7,7,7,7,8,8,8,8”.                                        UniPI_1            0.6741         0.7133         0.6937

    UniPI_2.u: silver corpus as training set,                 team3_.1.c         0.6178          0.735         0.6764
     sentiment specific word embeddings, fil-                  team8_.c           0.5646         0.7343         0.6495
     ters: “7,8,9,10”.                                         team5_1.c          0.6345         0.6139         0.6242
The following table reports the top official re-                Table 6 Top official results for SentiPolc subtask 1.
sults for the subtask 2:

                 Positive      Negative        Combined
   System                                                  5     Discussion
                 F-score       F-score          F-score
 UniPI_2.c        0.685         0.6426          0.6638
                                                           We confirmed the validity of the convolutional
  team1_1.u      0.6354         0.6885           0.662     neural networks in the twitter sentiment classifi-
  team1_2.u      0.6312         0.6838          0.6575     cation, also for the Italian language.
  team4_.c        0.644         0.6605          0.6522        The system achieved top score in the task 2 of
 team3_.1.c      0.6265         0.6743          0.6504     SENTiment POLarity Classification Task of
  team5_2.c      0.6426          0.648          0.6453
                                                           Evalita 2016.
 team3_.2.c      0.6395         0.6469          0.6432
                                                           Acknowledgments
 UniPI_1.u       0.6699         0.6146          0.6422
 UniPI_1.c       0.6766         0.6002          0.6384     We gratefully acknowledge the support of the
                                                           University of Pisa through project PRA and
 UniPI_2.u      0.6586           0.5654          0.612
                                                           NVIDIA Corporation for a donation of the Tesla
  Table 5. Top official results for SentiPolc subtask 2.
                                                           K40 GPU used in the experiments.
The run UniPI_2.c achieved the top overall score              Integris srl cooperated by providing a corpus
among a total of 26 submissions to task 2. This            of sentiment annotated tweets.
confirms the effectiveness of sentiment specific
word embeddings in sentiment polarity classifi-            References
cation also for Italian tweets.                            Giuseppe Attardi, Stefano Dei Rossi, and Maria Simi.
    The use of an extended silver corpus did not             2010. The Tanl Pipeline. In Proc. of LREC Work-
provide significant benefits, possibly because the           shop on WSPP, Malta.
resulting corpus was still unbalanced.                     Giuseppe Attardi. 2015. DeepNL: a Deep Learning
    In addition to the subtask 2, we submitted one           NLP pipeline. Workshop on Vector Space Model-
run for the Task 1 “Subjectivity Classification”:            ing for NLP. Proc. of NAACL HLT 2015, Denver,
given a message, decide whether the message is               Colorado (June 5, 2015).
subjective or objective. We used the same classi-          Giuseppe Attardi and Daniele Sartiano. 2016. UniPI
fier for the subtask 2, using only two classes               at SemEval-2016 Task 4: Convolutional Neural
(subjective, objective), with the same skip-gram             Networks       for      Sentiment      Classifica-
word embeddings used for the other task and the              tion. Proceedings of SemEval, 220-224.
configuration listed in Table 3, using the following
                                                           Francesco Barbieri, Valerio Basile, Danilo Croce,
filters: “7,8,9,10”, without performing extensive             Malvina Nissim, Nicole Novielli, and Viviana Pat-
experiments. The following table reports the top              ti. 2016. Overview of the EVALITA 2016 SEN-
official results for the subtask 1:                           Timent POLarity Classification Task. In Proceed-
                                                              ings of Third Italian Conference on Computational
                   Objective      Subjective    Combined      Linguistics (CLiC-it 2016) & Fifth Evaluation
    system
                    F-score        F-score       F-score      Campaign of Natural Language Processing and
  team1_1.u          0.6784         0.8105        0.7444      Speech Tools for Italian. Final Workshop (EVALI-
  TA 2016). Associazione Italiana di Linguistica       Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S.
  Computazionale (AILC).                                 Corrado, and Jeff Dean. 2013. Distributed repre-
                                                         sentations of words and phrases and their composi-
Felipe Bravo-Marquez, Eibe Frank, and Bernhard
                                                         tionality. Advances in Neural Information Pro-
   Pfahringer. 2015. Positive, negative, or neutral:
                                                         cessing Systems. arXiv:1310.4546
   Learning an expanded opinion lexicon from emoti-
   con-annotated tweets. IJCAI 2015. AAAI Press.       Preslav Nakov, Alan Ritter, Sara Rosenthal, Fabrizio
                                                          Sebastiani, and Veselin Stoyanov. 2016. SemEval-
Sepp Hochreiter, and Jürgen Schmidhuber. (1997).
                                                          2016 task 4: Sentiment analysis in Twitter.
  Long short-term memory. Neural computation 9.8
                                                          In Proceedings of the 10th international workshop
  1735-1780.
                                                          on semantic evaluation (SemEval 2016), San Die-
Omer Levy and Yoav Goldberg. 2014. Neural Word            go, US (forthcoming).
  Embedding as Implicit Matrix Factorization. In
                                                       Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting
  Advances in neural information processing sys-
                                                         Liu, and Bing Qin. 2014, June. Learning Senti-
  tems.
                                                         ment-Specific Word Embedding for Twitter Senti-
                                                         ment Classification. In ACL (1) (pp. 1555-1565).

</pre>