ELiRF-UPV at IberEval 2017: Stance and Gender Detection in Tweets José-Ángel González, Ferran Pla, Lluı́s-F. Hurtado Departament de Sistemes Informàtics i Computació Universitat Politècnica de València {jogonba2,fpla,lhurtado}@dsic.upv.es Abstract. This paper describes the participation of ELiRF-UPV team at the two Spanish subtasks of the Stance and Gender Detection in Tweets on Catalan Independence track of the IberEval workshop. We tested several approaches based on different models and tweet represen- tations. Our best approaches are based on neural networks with one-hot vector representation and Support Vector Machines using bag-of-ngrams of chars. We achieved the first place on the gender detection subtask and the fourth place on the stance detection subtask. Keywords: Neural Networks, Support Vector Machine, bag-of-words, one-hot vectors 1 Introduction Stance detection consist of automatically determining from text whether the author is in favor of the given target, against the given target, or whether neither inference is likely. Different international competitions have recently shown interest in these subjects: Stance on Twitter, task 6 at SemEval-2016 [5] and Gender detection at PAN@CLEF 2016 [8]. Stance and Gender detection in Tweets on Catalan Independence is one of the tracks proposed at Ibereval 2017 workshop [9]. The aim of this task is to detect the author’s gender and stance with respect to the target ”independence of Catalonia” in tweets written in Spanish and/or Catalan. 2 Corpus Description The corpus is composed by tweets labeled with respect to the independence of Catalonia (three classes: AGAINST, NEUTRAL, FAVOR) and with respect to the gender of the author of each tweet (two classes: MALE and FEMALE). These tweets are provided in Spanish and Catalan, however, we have only worked with the Spanish version of the proposed corpus. On the other hand, it is necessary to take into account that the corpus is unbalanced in terms of stance Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017) detection, being a clear bias between classes AGAINST and NEUTRAL with respect to class FAVOR. This unbalance does not occur in the gender detection subtask as can be seen in Table 1. Table 1. Number of samples per class in the Spanish subset of the corpus. Male Female Total Against 753 693 1446 Neutral 1216 1322 2538 Favor 190 145 335 Total 2159 2160 4319 3 System Description In this section we describe the main characteristics of the system developed to the Stance and Gender Detection in Tweets on Catalan Independence track of the IberEval workshop. This description includes the preprocessing used, the different tweets representations used and, the different models that were taken into account during the tuning phase. 3.1 Preprocessing The preprocessing process of the tweets was a bit different depending on the subtasks. In both cases, we removed the accents and converted all the text to lowercase. The web links (URL), and the numbers were substituted by a specific label. We assumed that the hashtags, the emoticons and the mentions to other users would be informative to determine the opinions of a user but not his/her gender. Accordingly to this assumption, we substituted the hashtags, the emoticons and user’s mentions by a specific label for the Gender subtask, but we kept their values for the Stance subtask. 3.2 Tweets representation We considered different approaches to represent the tweets: – Embeddings. Sequential representations of words represented with embed- dings Word2Vec [3], [4] [7] learned from the Spanish version of Wikipedia [10]. – Bag-of-ngrams. We tested as features, unigrams and bigrams of words and chars using a bag-of-ngrams representation. – One-hot vectors. We also tested unigrams and bigrams of chars using a one- hot vector representation. 194 Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017) Since stance detection may be related, in some way, to sentiment analysis, we tested the use of polarity lexicons for the Stance subtask. Specifically, we tried to include NRC lexicon [6] as extra features for stance detection. 3.3 Models We explored different models depending on the representation of the tweets. This way, Long short-term memory (LSTM) neural networks [2] assembled with Convolutional neural networks (CNN) were used to deal with the sequential representations (embeddings) and the one-hot vectors of chars representation, while for the bag-of-ngrams (both at the word level and at the char level) rep- resentation Support Vector Machines (SVM) with linear kernel and Multilayer Perceptrons (MLP) were used. The NRC polarity lexicon was used only with the embeddings representation. The topology used was similar to the one described in [1] but without the subnet dedicated to processing the sequences formed with embeddings obtained from the training corpus. 3.4 Tuning In order to select the representations and the models (including their parameters) more appropriated to each subtask, a tuning process was performed. The corpus provided by the organizers of the task was split into two sets, a set with the 80% of the tweet for learning the model and the remaining 20% of the corpus was used as tuning. The partitions were the same for all the tuning process. For tuning the models of each subtask, the official evaluation measure of each subtask was taken into account as optimization criterion. Faced with the impossibility of testing all combinations of models and rep- resentations, only those combinations we thought that made more sense were considered. Table 2 shows the most relevant combinations of features and mod- els as well as the results obtained during the tuning phase. Table 2. Results obtained in the tuning phase. System Features Stance (F1 ) Gender (Acc.) CNN+LSTM Embeddings 51.84 64.47% CNN+LSTM Emb+NRC 48.80 - CNN+LSTM One-Hot 55.10 - MLP Word 1-2grams - 59.72% MLP Char 2-grams - 63.81% SVM Word 1-2grams - 58.30% SVM Char 2-grams - 66.92% SVM Char 1-2-grams - 66.99% Regarding the stance detection subtask, as can be seen in the first row of Table 2, with sequential representation at word level (Wikipedia embeddings) 195 Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017) the result obtained was 51.84 for F1 measure. Unfortunately, worse results were obtained when the polarity sequence according to the NRC lexicon was added to the embeddings representation (Emb+NRC). We also tried the use of a different sequential representation formed by one- hot vectors at character level and processed with the same network used in the previous experiments. This representation together with CNN + LSTM obtained the best results on tuning, 55.10 of F1 measure as can be seen in the third row of Table 2. Regarding the gender detection subtask, although the sequential representa- tion at word level (embeddings from Wikipedia) processed using CNN + LSTM obtained good results, 64.47% in Accuracy, the best results in the tuning phase were achieved by the representations based on bag-of-ngrams of chars. The rep- resentation based on bag-of-ngrams of words achieved significantly worst results. The models that obtained better results were the Support Vector Machines with linear kernel. Specifically, the SVM model using bag-of-unigram of chars as representation of the tweets achieved 66.92% of Accuracy; while adding bag- of-bigrams of chars to the previous model slightly increases the Accuracy to 66.99%. These results correspond to the last two rows in Table 2. 4 Results In view of the results obtained during the tuning phase and due the limitation of the track, we decided to send the following two runs to the competition. – run1 • stance detection: CNN + LSTM + char-one-hot • gender detection: SVM + bag-of-2grams of chars – run2 • stance detection: CNN + LSTM + char-one-hot (the same as in run1) • gender detection: SVM + bag-of-1grams of chars + bag-of-2grams of chars (the best accuracy at tunning) Table 3 and Table 4 show the official result obtained by our systems in the stance detection subtask and the gender detection subtask respectively. The po- sition obtained by our system in the competition is also included in parenthesis. Table 3. Oficial results for the Stance detection subtask. run System and Features F1 run1/run2 CNN+LSTM + char One-hot vectors 46.37 (4) Once we have analyzed the results, both in the tuning phase and in the official competition, we want to point out some interesting things. In both subtasks, methods based on deep-learning have shown to offer com- petitive results. However, in the case of gender detection subtask, the best results 196 Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017) Table 4. Oficial results for the Gender detection subtask. run System and Features Acc. run1 SVM + bag-of-2grams of chars 68.55% (1) run2 SVM + bag-of-1grams + bag-of-2grams of chars 58.74% (14) have been obtained with a priori simple model, SVM and bag-of-chars. We hy- pothesize that the good results achieved by SVM models in this subtask is due to the greater robustness of these models (compared with deep-learning based models) to deal with the bias problem. If the imbalance is very large, it can cause that the network assigns all the samples only to the majority classes. The solu- tion used for the stance detection subclass was to perform a scaling of the loss function during the training phase. This has prevented the network from classi- fying all tweets in the AGAINST and NEUTRAL classes (the majority classes with much difference in the Spanish version of the stance detection corpus). Regarding the stance detection subtask, a sequential character-level represen- tation has been chosen due to the increasing interest this kind of representations are having in the deep-learning area and the good results they are achieving [11]. In this way, we have been able to verify that, effectively, this type of repre- sentations (in conjunction with neural networks that handle sequences) provide competitive results in text classification tasks such as the stance detection sub- task. 5 Conclusions and Future work We have presented the participation of the ELiRF-UPV team at the Stance and Gender Detection in Tweets on Catalan Independence track of the IberEval workshop. Our team has participated in the two Spanish subtasks of the track and has achieved competitive results. Our best approaches were based on neu- ral networks with sequential representation of the tweets and Support Vector Machines with bag-of-ngrams of chars. As future work, we plan to use representations based on one-hot vectors at character level and CNN + LSTM on other tweet classification problems (TASS, SemEval, ...) in order to study their behavior in tasks other than stance detection. Acknowledgements This work has been partially supported by the Spanish MINECO and FEDER founds under project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics, TIN2014-54288-C4-3-R. 197 Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017) References 1. González, J.A., Pla, F., Hurtado, L.F.: ELiRF-UPV at SemEval-2017 Task 4: Sen- timent Analysis using Deep Learning. In: Proceedings of the 11th International Workshop on Semantic Evaluation. pp. 722–726. SemEval ’17, Association for Com- putational Linguistics, Vancouver, Canada (August 2017) 2. Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation 9(8), 1735–1780 (1997) 3. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word repre- sentations in vector space. CoRR abs/1301.3781 (2013), http://arxiv.org/abs/ 1301.3781 4. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed repre- sentations of words and phrases and their compositionality. CoRR abs/1310.4546 (2013), http://arxiv.org/abs/1310.4546 5. Mohammad, S.M., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: Semeval-2016 task 6: Detecting stance in tweets. In: Proceedings of the International Workshop on Semantic Evaluation. SemEval ’16, San Diego, California (June 2016) 6. Mohammad, S.M., Turney, P.D.: Crowdsourcing a Word-Emotion Association Lex- icon 29(3), 436–465 (2013) 7. Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Cor- pora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. pp. 45–50. ELRA, Valletta, Malta (May 2010), http://is.muni.cz/ publication/884893/en 8. Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., Stein, B.: Overview of PAN’16, pp. 332–350. Springer International Publishing (2016) 9. Taulé, M., Martı́, M., Rangel, F., Rosso, P., Bosco, C., Patti, V.: Overview of the task of Stance and Gender Detection in Tweets on Catalan Independence at IBEREVAL 2017. In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017). CEUR Workshop Proceedings. CEUR-WS.org, 2017, Murcia (Spain) (September 2017) 10. Wikipedia: Wikipedia spanish dumps (2017), https://dumps.wikimedia.org/ eswiki/, [Online; accessed 18-May-2017] 11. Zhang, X., Zhao, J., LeCun, Y.: Character-level Convolutional Networks for Text Classification. In: Proceedings of the 28th International Conference on Neural In- formation Processing Systems. pp. 649–657. NIPS’15, MIT Press, Cambridge, MA, USA (2015) 198