Introducction

ELiRF-UPV at MultiStanceCat 2018

Jose-Angel Gonzalez[

Llu s-Felip Hurt

0 0 Departament de Sistemes Informatics i Computacio Universitat Politecnica de Valencia

2018

173 179

This paper describes the participation of ELiRF-UPV team at the Spanish subtasks of the MultiModal Stance Detection in tweets on Catalan #1Oct Referendum workshop. Our best approach is based on Convolutional Neural Networks using word embeddings and polarity/emotion lexicons. We obtained competitive results on the Spanish subtask using only the text of the tweet, dispensing with contexts and images.

Deep Learning Networks

Introducction

The corpus is composed by tweets labeled with respect to the stance of the Catalan rst October Referendum (2017). There are three classes: AGAINST (AG), NEUTRAL (NE) and FAVOR (FA). These tweets are provided in Spanish and Catalan, however, we worked only with the Spanish subtask. Moreover, although context of the tweet and images are also provided by the organizers, we only used the text of the tweet.

J-A. Gonzalez et al.

From the o cial training corpus, we randomly selected 80% in order to train our models. The remaining 20% was used as development set. Table 1 shows the sample distribution per class in the Spanish corpus. In this Section, we describe the two models used in the competition. Both models share the same preprocessing of the tweets by means of the TweetMotif [ 2 ] package. We applied a normalization step consisted on lowercasing the words, removing some language-speci c characters such as accent, dieresis, special language characters, and normalizing Twitter-speci c tokens (hashtags, user mentions and urls) by replacing them for a xed word e.g. #1octL6 ! #hashtag.

As rst model for the experimentation, we used a Support Vector Machine (SVM) classi er with di erent representations of the tweets. Concretely, we used bag-of-word-ngrams and bag-of-char-ngrams with several values of n (including combination of ngrams e.g. bag-of-word-1-4grams means the concatenation of n = [1; 4]grams).

As second model for the experimentation, we used a Convolutional Neural Network (CNN) architecture inspired by the work presented in [ 12 ], with the aim of obtaining representations of the tweets similar to continuous versions of the bag-of-ngrams. We represented the tweets using Word2Vec distributed representations of words [ 3 ] [ 4 ]. Moreover, to enrich the system, we used several polarity/emotion lexicons combined with the word embeddings.

We used ELHPolar [ 8 ], ISOL [ 7 ], MLSenticon [ 1 ] and the Spanish version of NRC [ 6 ] as lexicons. As word embeddings, we trained a skip-gram model, with 300 dimensions for each word, from 87 million Spanish tweets collected for previous experimental work.

We represent each tweet x as a matrix S 2 Rn (d+v), where n is the maximum number of words per tweet, d is the dimensionality of word embeddings and v is the dimensionality of the polarity/emotion features, that is, the number of polarity/emotion lexicons. In order to obtain this representation, we use an embedding model h(w) 2 Rd and a set of lexicons h0 (w) = [h01(w); h02(w); :::; h0l(w)] 2 Rv, where h0k(w) is the polarity value of the word w in the lexicon k.

Therefore, given a tweet x with n tokens, x = w1; w2; :::; wn, we represent it as a matrix S in which, each row i is the concatenation of the embedding of wi (h(wi)) and a vector with the polarity values of wi in each lexicon (h0 (wi)),

ELiRF-UPV at MultiStanceCat 2018 3

S = [h(w1)jh0 (w1); h(w2)jh0 (w2); :::; h(wn)jh0 (wn)]. In the case where a word wi is out of vocabulary for the embedding models, we replace its embedding by the embedding of the word \unknown", h(wi) = h(\unknown"). Similarly, if wi is not included in any lexicon, h0 (wi) = [0; 0; :::; 0] 2 Rv.

Due to the variable length of the tweets, we used zero padding at the start of a tweet if it does not reach the maximum speci ed length. Otherwise, if the length of a tweet is greater than the maximum, we only consider the rst n words of the tweet. In this task, the average number of words per tweet is navg = 18:5, and the maximum length is nmax = 34. We decided to set the length n = 26 which is the mean of navg and nmax.

Regarding the CNN architecture, we applied one-dimensional convolutions with variable height lters in order to extract the temporal structure of the tweet over several region sizes. Figure 1 summarizes the model architecture and its hyperparameters.

Batch Normalization + ReLU Convolution 1D Global Max Pooling

Concat + Batch Normalization Softmax fully-connected layer 4 region sizes ([ 1, 4 ]) 256 filters for each region size 1024 different filters 256 feature maps for each region size ℝ4× ×256 256 salient features for each region size ℝ4×256

Concatenated salient features ℝ1024 do you think humans have sense

Batch Normalization Sentence matrix ℝ ×( + ) … … … … … … … ……… ……… ……… ……… ……… ……… … …… … …

⋮ ⋮ … ⋮ 3 classes

As can be seen in Figure 1, we used 4 di erent region sizes (the lter height range from 1 to 4) and 256 lters for each region size. We used this range of region sizes because, in the development phase, the best baseline was SVM using bagof-word-1-4grams. After the lters were applied, we obtained 256 output feature maps for each region size.

In order to extract the most salient features for each region size, we applied 1D Global Max Pooling to the feature maps of each region size. Therefore, we obtained 4 vectors with 256 components, that were concatenated and used as 4 input to a fully-connected layer which performs the classi cation task. We used a softmax activation function to model the posterior distribution of each class at the output layer. 4

Experimental results

In this section, we describe the experimental work conducted by ELiRF team in the MultiStanceCat task. In addition, we present a study of the performance of our best system in the competition.

Table 2 summarizes the results obtained in the development phase. Three di erent classi ers were considered: Linear SVM with bag-of-ngrams of words, Linear SVM with bag-of-ngrams of chars and CNNs.

For the SVM approaches we tested di erent values of n. With respect to the CNN we explored three loss functions: Cross Entropy (CCE), Mean Squared Error (MSE) and di erentiable approximation of the F1 measure (SMF1). It can be observed in Table 2 that generally bag-of-chars performs worse than bag-of-words. Note that CNN models outperform the results achieved by the SVM classi ers. Moreover, CNN classi er with SMF1 loss function outperforms the results of all the other classi ers. However, a deeper study about which factors such as embeddings or lexicons are more relevant in the results would be interesting.

ELiRF-UPV at MultiStanceCat 2018 5

It can be also observed that the value of the F1 measure for the NEUTRAL class (F1(N E)) is generally lower than the F1 measures for AGAINST and FAVOR classes (F1(AG); F1(F A)). We hypothesize this is due to the fact that NEUTRAL class has less samples in the corpus. However, low values of F1(N E) measure do not a ect the o cial evaluation measure that is de ned as the average between F1(AG) and F1(F A).

For the Spanish subtask competition, we selected the best CNN and SVM models according to the results obtained in the development phase. Concretely, our rst run (ELiRF-1) was the CNN model trained using SMF1 loss function. As a second run (ELiRF-2) we selected a Linear SVM with bag-of-word-1-4grams.

Table 3 shows the confusion matrices of the two submitted systems. It can be observed that both systems confuse the NEUTRAL and the AGAINST classes in a similar way. The best performance achieved by ELiRF-1 run is because it predicts better the FAVOR class. We have also performed a study of the samples that ELiRF-1 system misclassi ed with high con dence. Some of these samples are shown in Table 4. We think that in some cases, errors could be avoided by considering hashtags (sample 5, #4gatos) or user mentions (error 2, @CatalunyaPlural). Unfortunately, we have not included this information in our models. 6

Table 5 shows the results on the test set for all the participating teams in the Spanish task. The ELiRF-1 run obtained competitive results without using the text of previous and next tweets or the images in the user timeline. Moreover, we can observe that the context seems to be useful for this task because all the best participating teams used this information. Finally, we would like to highlight the great di erence observed in the results obtained on the development and the test sets. We have no explanation for this, but we think that a study about this aspect should be done when the test set will be available. In this paper, we have presented the participation of the ELiRF team at MultiStanceCat track of the IberEval workshop. Our team participated in the Spanish subtask of this track and competitive results were achieved using only the text of the tweets. Our best approach is based on CNN with sequential representation of the tweets using word embedding, and polarity/emotion lexicons.

As future work, we plan to include the context of the tweet in our deep learning system in a similar way as Hierarchical Attention Networks [ 11 ] do. Moreover, we think that data augmentation could help to improve the performance of the models.

We have observed that hashtags and user mentions contains relevant information for this task. For this reason, as future work, we want to explore the inclusion of this information in the tweet representation. 6

Acknowledgements

This work has been partially supported by the Spanish MINECO and FEDER founds under project AMIC (TIN2017-85854-C4-2-R). Work of Jose-Angel Gonzalez is also nanced by Universitat Politecnica de Valencia under grant PAID-01-17.

1. Cruz , F.L. , Troyano , J.A. , Pontes , B. , Ortega , F.J.: Building layered, multilingual sentiment lexicons at synset and lemma levels . Expert Systems with Applications 41 ( 13 ), 5984 { 5994 ( 2014 )

2. Krieger , M. , Ahn , D. : Tweetmotif: exploratory search and topic summarization for twitter . In: In Proc. of AAAI Conference on Weblogs and Social ( 2010 )

3. Mikolov , T. , Chen , K. , Corrado , G. , Dean , J.: E cient estimation of word representations in vector space . CoRR abs/1301 .3781 ( 2013 ), http://arxiv.org/abs/1301.3781

4. Mikolov , T. , Sutskever , I. , Chen , K. , Corrado , G. , Dean , J. : Distributed representations of words and phrases and their compositionality . CoRR abs/1310 .4546 ( 2013 ), http://arxiv.org/abs/1310.4546

5. Mohammad , S. , Kiritchenko , S. , Sobhani , P. , Zhu , X. , Cherry , C. : Semeval-2016 task 6: Detecting stance in tweets . In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) . pp. 31 { 41 ( 2016 )

6. Mohammad , S.M. , Turney , P.D.: Crowdsourcing a word-emotion association lexicon . Computational Intelligence 29 ( 3 ), 436 { 465 ( 2013 )

7. Molina-Gonzalez , M.D. , Mart nez-Camara, E. , Mart n-Valdivia, M.T. , PereaOrtega , J.M.: Semantic orientation for polarity classi cation in spanish reviews . Expert Systems with Applications 40 ( 18 ), 7250 { 7257 ( 2013 )

8. Saralegi , X. , San Vicente, I.: Elhuyar at tass 2013 . In: XXIX Congreso de la Sociedad Espaola de Procesamiento de lenguaje natural, Workshop on Sentiment Analysis at SEPLN (TASS2013) . pp. 143 { 150 ( 2013 )

9. Taule , M. , Mart , M.A. , Rangel , F.M. , Rosso , P. , Bosco , C. , Patti , V. , et al.: Overview of the task on stance and gender detection in tweets on catalan independence at ibereval 2017 . In: 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages , IberEval 2017 . vol. 1881 , pp. 157 { 177 . CEURWS ( 2017 )

10. Taule , M. , Rangel , F.M. , Mart , M.A. , Rosso , P. : Overview of the task on multimodal stance detection in tweets on catalan #1oct referendum' . In: Third Workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval 2018 ( 2018 )

11. Yang , Z. , Yang , D. , Dyer , C. , He , X. , Smola , A. , Hovy , E.: Hierarchical attention networks for document classi cation . In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . pp. 1480 { 1489 ( 2016 )

12. Zhang , Y. , Wallace , B. : A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classi cation . In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . pp. 253 { 263 . Asian Federation of Natural Language Processing ( 2017 ), http://aclweb.org/anthology/I17-1026