=Paper=
{{Paper
|id=Vol-1749/paper_027
|storemode=property
|title=Convolutional Neural Networks for Sentiment Analysis on Italian Tweets
|pdfUrl=https://ceur-ws.org/Vol-1749/paper_027.pdf
|volume=Vol-1749
|authors=Giuseppe Attardi,Daniele Sartiano,Chiara Alzetta,Federica Semplici
|dblpUrl=https://dblp.org/rec/conf/clic-it/AttardiSAS16
}}
==Convolutional Neural Networks for Sentiment Analysis on Italian Tweets==
Convolutional Neural Networks for Sentiment Analysis on Italian Tweets Giuseppe Attardi, Daniele Sartiano, Chiara Alzetta, Federica Semplici Dipartimento di Informatica Università di Pisa Largo B. Pontecorvo, 3 I-56127 Pisa, Italy {attardi, sartiano}@di.unipi.it, {c.alzetta, f.semplici}@studenti.unipi.it Task 1: Subjectivity Classification: identi- Abstract fy the subjectivity of a tweet. English. The paper describes our sub- Task 2: Polarity Classification: classify a mission to the task 2 of SENTIment PO- tweet as positive, negative, neutral or Larity Classification in Italian Tweets at mixed (i.e. a tweet with positive and nega- Evalita 2016. Our approach is based on a tive sentiment). convolutional neural network that ex- ploits both word embeddings and Senti- Task 3: Irony Detection: identify if is pre- ment Specific word embeddings. We also sent the irony in a tweet. experimented a model trained with a dis- The state of the art on the polarity classifica- tant supervised corpus. Our submission tion of tweets is the application of Deep Learning with Sentiment Specific word embed- methods (Nakov et al., 2016), like convolutional dings achieved the first official score. neural network or recurrent neural networks, in particular long short-term memory networks Italiano. L’articolo descrive la nostra (Hochreiter, and Schmidhuber, 1997). partecipazione al task 2 di SENTIment We explored Deep Learning techniques for the POLarity Classification in Italian Tweets sentiment analysis of English tweets at Semeval a Evalita 2016. Il nostro approccio si ba- 2016 with good results, where we noticed that sa su una rete neurale convoluzionale use of convolutional neural network and Senti- che sfrutta sia word embeddings tradi- ment Specific word embeddings was promising. zionali che sentiment specific word em- We applied a similar approach for the Italian beddings. Abbiamo inoltre sperimentato language, building word embeddings from a big un modello allenato su un corpus costrui- corpus of Italian tweets, sentiment specific word to mediante tecnica distant supervised. Il embeddings from positive and negative tweets, nostro sistema, che utilizza Specific Sen- using a convolutional neural network as classifi- timent word embeddings, ha ottenuto il er. We also introduced a distant supervised cor- primo punteggio officiale. pus as silver training set. We report the results of our experiments with 1 Introduction this approach on the task Evalita 2016 Sentipolc Task 2 Polarity classification. The paper describes our submissions to the Task 2 of SENTiment POLarity Classification at Evalita 2016 (Barbieri et al. 2016). 2 Description of the System In Sentipolc the focus is the sentiment analysis of the in Italian tweets, it is divided in three sub- The architecture of the system consists of the tasks: following steps: Oggi non mi sento molto bene EMO_SAD embeddings convolutional layer Multilayer percep- max over time for each word with tron pooling multiple filters with dropout Figure 1. The Deep Learning classifier. build word embeddings from a collection of 167 million tweets collected with the 2.2 Word Embeddings and Sentiment Spe- Twitter API over a period of May to Sep- cific Word Embeddings tember 2016, preprocessed as described We experimented with standard word embed- later. dings, in particular building them with the tool word2vec1 (Mikolov, 2013), using the skip-gram build Sentiment Specific word embed- model. These word embeddings though do not dings using a portion of these tweets split take into account semantic differences between into positive/negative by distant supervi- words expressing opposite polarity, since they sion. basically encode co-occurrence information as train a convolutional neural network clas- shown by (Levy and Goldberg, 2014). For en- sifier using one of the above word embed- codes sentiment information in the continuous dings representation of words, we use the technique of Tang et al. (2014) as implemented in the The convolutional neural network classifier ex- DeepNL 2 library (Attardi, 2015). A neural net- ploits pre-trained word embeddings as only fea- work with a suitable loss function provides the tures in various configurations as described be- supervision for transferring the sentiment polari- low. The architecture of the classifier consists of ty of texts into the embeddings from generic the following layers described in Figure 1: a tweets. lookup layer for word embeddings, a convolu- tional layer with a ReLU activation function, a 2.3 Distant supervision maxpooling layer, a dropout layer, a linear layer with tanh activation and a softmax layer. This is The frequency distribution of classes in the da- the same classifier described in (Attardi and Sar- taset, as shown in Table 1, seems skewed and not tiano, 2016), that achieved good results at the fully representative of the distribution in a statis- SemEval 2016 task 4 on Sentiment Analysis in tical sample of tweets: negative tweets are nor- Twitter (Nakov et al., 2016). Here we test it on a mally much less frequent than positive or neutral similar task for Italian tweets. ones (Bravo-Marquez, 2015). To reduce this bias and to increase the size of the training set, we 2.1 Data Preprocessing selected additional tweets from our corpus of Italian tweets by means distant supervision. In In order to build the word embeddings we pre- the first step we selected the tweets belonging to processed the tweets using tools from the Tanl a class (positive, negative, neutral, mixed) via pipeline (Attardi et al., 2010): the sentence split- regular expressions. In the second step, the se- ter and the specialized tweet tokenizer for the lected tweets are classified by the classifier tokenization and the normalization of tweets. trained using the task trainset. The silver corpus Normalization involved replacing the mentions is built taking the tweets with the matched class with the string “@mention”, emoticons with their between the regular expression system and the name (e.g. “EMO_SMILE”) and URLs with classifier. “URL_NORM”. 1 https://code.google.com/archive/p/word2vec/ 2 https://github.com/attardi/deepnl 3 Experiments The training set is still fairly small, compared for The plain word embeddings were built applying example to the size of the corpus used in vord2vec to a collection of 167 million Italian SemEval 2106. The “mixed” class in particular is unlabeled tweets, using the the skip gram model, small in absolute numbers, even though not in and the following parameters: embeddings size percentage value, which makes hard to properly 300, window dimension 5, discarding word that train a ML classifier. appear less than 5 times. We obtained about Therefore we tried to increase the training set 450k word embeddings. by means of the distant supervision as described The sentiment specific word embeddings above: we selected a maximum of 10,000 tweets (SWE) were built with DeepNL, starting from for class via regular expressions, then we classi- the word embeddings built at the previous step fied them with the classifier trained with the gold and tuning them with a supervised set of positive training set. We chose for addition into a silver or negative tweets, obtained as follows from 2.3 training set, the tweets which were assigned by million tweets selected randomly from our cor- the classifier the same class of the regular ex- pus of collected tweets: pression. As reported in Table 2, the silver dataset Positive tweet: one that contains only remains unbalanced; in particular, no “mixed” emoticons from a set of positive emoti- example was added to the original train set. cons (e.g. smiles, hearts, laughs, expres- sions of surprise, angels and high fives). Class Train set Dev set Neutral 8505 554 Negative tweet: one that contains only Negative 5987 513 emoticons from a set of negative emotions Positive 6813 312 (e.g. tears, angry and sad). Mixed 337 103 Table 2. Distant supervised dataset distribution. Integris srl cooperated to the task providing a set of 1.3 million tweets, selected by relying on a Table 3 shows the common settings used for train- lexicon of handcrafted polarized words. This re- ing the classifier. We used the same parameters source is also added to the corpus. as SemEval-2016. We split the training set provided for the Evalita 2016 SentiPolc Task into a train set Word Embeddings Size 300 (5335 tweets), a validation set (592 tweets) and a Hidden Units 100 test set (1482 tweets). This dataset was tokenized Dropout Rate 0.5 and normalized as described in Section 2.1. Batch size 50 Adadelta Decay 0.95 For the take of participating to subtask 2, po- 50 Epochs larity classification, the 13-value annotations Table 3. Network Common Settings present in the datasets were converted into four values: “neutral”, “positive”, “negative” and We performed extensive experiments with the “mixed” depending on the values of the fields classifier in various configurations, varying the “opos” and “oneg”, which express the tweet po- number of filters; the use of skip-gram word em- larity, according to the task guidelines3. We did beddings or sentiment specific word embed- not take into account the values for “lpos” and dings; different training sets, either the gold one “lneg”. or the silver one. Results of the evaluation on the The frequency distribution of these classes validation set allowed us to choose the best set- turns out to be quite unbalanced, as shown in tings, as listed in the Table 4. Best Settings. Table 1. Run1 Run2 Class Train set Validation set Embeddings WE skipgram SWE Neutral 2262 554 Training set Gold Silver Gold Silver Negative 2029 513 Filters 2,3,5 4,5,6,7 7,7,7,7,8,8,8,8 7,8,9,10 Positive 1299 312 Table 4. Best Settings Mixed 337 103 Table 1. Task dataset distribution 4 Results We submitted four runs for the subtask 2 “polari- 3 http://www.di.unito.it/~tutreeb/sentipolc- ty classification”: evalita16/sentipolc- guidelines2016UPDATED130916.pdf UniPI_1.c: gold training set, word embed- team1_2.u 0.6723 0.7979 0.7351 dings with skip-gram model, filters: team2_.1.c 0.6555 0.7814 0.7184 “2,3,5”. team3_.2.c 0.6733 0.7535 0.7134 UniPI_1.u: silver corpus as training set, team4_.c 0.6465 0.775 0.7107 word embeddings with skip-gram model, filters: “4,5,6,7”. team5_2.c 0.6671 0.7539 0.7105 team6_.c 0.6623 0.755 0.7086 UniPI_2.c: gold training set, sentiment specific word embeddings, filters: team1_.c 0.6499 0.759 0.7044 “7,7,7,7,8,8,8,8”. UniPI_1 0.6741 0.7133 0.6937 UniPI_2.u: silver corpus as training set, team3_.1.c 0.6178 0.735 0.6764 sentiment specific word embeddings, fil- team8_.c 0.5646 0.7343 0.6495 ters: “7,8,9,10”. team5_1.c 0.6345 0.6139 0.6242 The following table reports the top official re- Table 6 Top official results for SentiPolc subtask 1. sults for the subtask 2: Positive Negative Combined System 5 Discussion F-score F-score F-score UniPI_2.c 0.685 0.6426 0.6638 We confirmed the validity of the convolutional team1_1.u 0.6354 0.6885 0.662 neural networks in the twitter sentiment classifi- team1_2.u 0.6312 0.6838 0.6575 cation, also for the Italian language. team4_.c 0.644 0.6605 0.6522 The system achieved top score in the task 2 of team3_.1.c 0.6265 0.6743 0.6504 SENTiment POLarity Classification Task of team5_2.c 0.6426 0.648 0.6453 Evalita 2016. team3_.2.c 0.6395 0.6469 0.6432 Acknowledgments UniPI_1.u 0.6699 0.6146 0.6422 UniPI_1.c 0.6766 0.6002 0.6384 We gratefully acknowledge the support of the University of Pisa through project PRA and UniPI_2.u 0.6586 0.5654 0.612 NVIDIA Corporation for a donation of the Tesla Table 5. Top official results for SentiPolc subtask 2. K40 GPU used in the experiments. The run UniPI_2.c achieved the top overall score Integris srl cooperated by providing a corpus among a total of 26 submissions to task 2. This of sentiment annotated tweets. confirms the effectiveness of sentiment specific word embeddings in sentiment polarity classifi- References cation also for Italian tweets. Giuseppe Attardi, Stefano Dei Rossi, and Maria Simi. The use of an extended silver corpus did not 2010. The Tanl Pipeline. In Proc. of LREC Work- provide significant benefits, possibly because the shop on WSPP, Malta. resulting corpus was still unbalanced. Giuseppe Attardi. 2015. DeepNL: a Deep Learning In addition to the subtask 2, we submitted one NLP pipeline. Workshop on Vector Space Model- run for the Task 1 “Subjectivity Classification”: ing for NLP. Proc. of NAACL HLT 2015, Denver, given a message, decide whether the message is Colorado (June 5, 2015). subjective or objective. We used the same classi- Giuseppe Attardi and Daniele Sartiano. 2016. UniPI fier for the subtask 2, using only two classes at SemEval-2016 Task 4: Convolutional Neural (subjective, objective), with the same skip-gram Networks for Sentiment Classifica- word embeddings used for the other task and the tion. Proceedings of SemEval, 220-224. configuration listed in Table 3, using the following Francesco Barbieri, Valerio Basile, Danilo Croce, filters: “7,8,9,10”, without performing extensive Malvina Nissim, Nicole Novielli, and Viviana Pat- experiments. The following table reports the top ti. 2016. Overview of the EVALITA 2016 SEN- official results for the subtask 1: Timent POLarity Classification Task. In Proceed- ings of Third Italian Conference on Computational Objective Subjective Combined Linguistics (CLiC-it 2016) & Fifth Evaluation system F-score F-score F-score Campaign of Natural Language Processing and team1_1.u 0.6784 0.8105 0.7444 Speech Tools for Italian. Final Workshop (EVALI- TA 2016). Associazione Italiana di Linguistica Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Computazionale (AILC). Corrado, and Jeff Dean. 2013. Distributed repre- sentations of words and phrases and their composi- Felipe Bravo-Marquez, Eibe Frank, and Bernhard tionality. Advances in Neural Information Pro- Pfahringer. 2015. Positive, negative, or neutral: cessing Systems. arXiv:1310.4546 Learning an expanded opinion lexicon from emoti- con-annotated tweets. IJCAI 2015. AAAI Press. Preslav Nakov, Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani, and Veselin Stoyanov. 2016. SemEval- Sepp Hochreiter, and Jürgen Schmidhuber. (1997). 2016 task 4: Sentiment analysis in Twitter. Long short-term memory. Neural computation 9.8 In Proceedings of the 10th international workshop 1735-1780. on semantic evaluation (SemEval 2016), San Die- Omer Levy and Yoav Goldberg. 2014. Neural Word go, US (forthcoming). Embedding as Implicit Matrix Factorization. In Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Advances in neural information processing sys- Liu, and Bing Qin. 2014, June. Learning Senti- tems. ment-Specific Word Embedding for Twitter Senti- ment Classification. In ACL (1) (pp. 1555-1565).