=Paper= {{Paper |id=Vol-1881/StanceCat2017_paper_2 |storemode=property |title=iTACOS at IberEval2017: Detecting Stance in Catalan and Spanish Tweets |pdfUrl=https://ceur-ws.org/Vol-1881/StanceCat2017_paper_2.pdf |volume=Vol-1881 |authors=Mirko Lai,Alessandra Teresa Cignarella,Delia Irazú Hernández Farías |dblpUrl=https://dblp.org/rec/conf/sepln/LaiCF17 }} ==iTACOS at IberEval2017: Detecting Stance in Catalan and Spanish Tweets== https://ceur-ws.org/Vol-1881/StanceCat2017_paper_2.pdf
    iTACOS at IberEval2017: Detecting Stance in
           Catalan and Spanish Tweets

 Mirko Lai1,2 , Alessandra Teresa Cignarella1 , Delia Irazú Hernández Farı́as1,2
              1
                  Dipartimento di Informatica, Università degli Studi di Torino
              2
                  PRHLT Research Center, Universitat Politècnica de València



          Abstract In this paper we describe the iTACOS submission for the
          Stance and Gender Detection in Tweets on Catalan Independence shared
          task. Concerning the detection of stance, we ranked as the first position
          in both languages outperforming the baselines; while in gender detec-
          tion we ranked as fourth and third for Catalan and Spanish. Our ap-
          proach is based on three diverse groups of features: stylistic, structural
          and context-based. We introduced two novel features that exploit sig-
          nificant characteristics conveyed by the presence of Twitter marks and
          URLs. The results of our experiments are promising and will lead to
          future tailoring of these two features in a finer grained manner.


1       Introduction
Recently, there is a special interest in the task of monitoring people’s stance
towards particular targets; thus leading to the creation of a novel area of inves-
tigation named Stance Detection (SD). Research on this topic could have a pos-
itive impact on different aspects such as public administration, policy-making,
and security. In fact, through the constant monitoring of people’s opinion, de-
sires, complaints and beliefs on political agenda or public services, administrators
could better meet population’s needs. For example, a practical application of SD
could improve the automatic identification of people’s extremist tendencies (i.e.
religious extremism [1]).
    In 2016, for the first time a shared task on SD has been held at SemEval-2016,
namely the task 6: Detecting Stance in Tweets3 was organized in the framework
of SemEval. The participating teams were required to determine stance towards
six different targets: “Atheism”, “Climate Change is a Real Concern”, “Donald
Trump”, “Feminist Movement”, “Hillary Clinton”, and “Legalization of Abor-
tion”. Most of the proposed approaches exploited standard text classification
features such as n-grams as well as word embeddings. More details about the
participating systems can be found in [2]. In general, related work on SD is
scarce, only few works have been published on this novel task. Mohammad et
al. [3] took advantage of word-based and sentiment-based features to perform
SD on the SemEval-2016 Task 6 dataset. Lai et al. [4], instead, proposed an
approach using context features to detect stance towards two targets related to
    3
        http://alt.qcri.org/semeval2016/task6/
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)




         politics in the U.S. presidential elections: Hillary Clinton and Donald Trump.
         The obtained results outperformed those from the shared task.
             In this paper we present our participation to the Stance and Gender Detec-
         tion in Tweets on Catalan Independence task [5] at IberEval-20174 . The task is
         articulated into two subtasks about information contained in Twitter messages
         written both in Catalan and Spanish: the first subtask is related to detecting
         author’s stance towards the independence of Catalonia, while the second one
         aims at identifying their gender.
             Inferring people’s traits such as gender, age or native language on the basis
         of their written texts is investigated by a field named Author Profiling (AP).
         From 2013 onwards a shared task on AP has been organized at PAN [6,7,8,9] in
         the framework of CLEF5 . The intuition behind the task of gender recognition is
         that of studying how language is used by people and trying to identify features,
         devices or patterns that are more likely exploited by one gender or the other.
         More details on the state-of-the-art approaches on this task can be found in
         [9,10].


         2        Our proposal
         The starting point of our proposal is to be found in the method proposed in
         Lai et al. [4] in which the authors exploited three diverse groups of features:
         Structural such as punctuation and other Twitter marks, Sentiment i.e. lexica
         covering different facets of affect, and finally Context-based, which consider the
         relationship that exists between a given target and other entities in its domain.
             Therefore, we propose a supervised approach which consists in determining
         stance towards the independence of Catalonia as well as the gender of the author
         of a given tweet. In our work, we explored some features that can be grouped
         in three main categories: Stylistic, Structural, and Context. In the present paper
         we were not able to explore Sentiment features as in [4] due to the fact that we
         are not aware of sentiment lexica for Spanish and Catalan. We define a set of
         features distributed as follows:

             • Stylistic Features
                − Bag of Words (BoW )6
                − Bag of Part-of-Speech labels (BoP )6,7
                − Bag of Lemmas (BoL)6,7
                − Bag of Char-grams (BoC )8

             • Structural Features
              4
              http://stel.ub.edu/Stance-IberEval2017/
              5
              http://clef2017.clef-initiative.eu/
            6
              Each tweet was pre-processed for converting it to lowercase. We used unigrams,
         bigrams and trigrams with a binary representation.
            7
              We used TreeTagger [11,12] for extracting both the part-of-speech and lemmas.
            8
              We considered chargrams of 2 and 3 characteres.




                                                                                                                        186
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)




                − Bag of Twitter Marks (BoTM ). We exploit a Bag of Words considering
                  only the words extracted from multi-word Twitter Marks (hashtags and
                  mentions) splitting them by capital letters.
                − Bag of Hashtags (BoH ). We consider the hashtags as terms for building
                  a vector with binary representation.
                − Frequency of Hashtags (freqHash).
                − Uppercase Words (UpW ). This feature refers to the amount of words
                  starting with a capital letter.
                − Punctuation Marks (PM ). We take into account the frequency of dot,
                  comma, semicolon, exclamation and question marks.
                − Length (Length). Three different features were considered to build a
                  vector: number of words, number of characters, and the average of the
                  length of the words in each tweet.

             • Context Features
                − Language (Lan). We create a vector exploiting the labels es for Spanish
                  and ca for Catalan provided by the organizer.
                − URL (Url ). We observed that tweets containing a URL are common in
                  the training dataset. We decided to take advantage of this by considering
                  different aspects extracted from short URLs. First, we identified if the
                  web address of reference is or not reachable. Second, we retrieved the
                  words contained on the web address, then we build a bag-of-words using
                  this information.


         3      Experiments and Results

         The organizers provided a dataset of 8,638 tweets written in Spanish and Catalan
         labelled with stance (against, favor, and neutral) and gender (female and
         male). For what concerns gender, the distribution is balanced among female
         and male tweets. Regarding stance, the distribution is skew towards favor
         for Catalan and skew towards neutral for Spanish (respectively 30.66% and
         29.38%). Similar trends were found in Bosco et al. [13].
             It appears, therefore, that language could be a useful feature for stance de-
         tection in the Catalan independence debate concerning a region characterized by
         a strong bilingualism and a smoldering nationalism. In fact, Language divides
         and unites us. It [...] impinges upon our identity as individuals, as members of
         a particular ethnic or national group, and as citizens of a given polity [14].
         We therefore believe that there is a strong correlation between stance and the
         exploitation of language.
             In order to assess the performance of the participating systems, a test set
         of 2,162 unlabelled tweets was provided, and the two tasks were evaluated sep-
         arately. Two different evaluation metrics were used: (1) the macro-average of
         F-score (favor and against) was used in the case of stance detection and (2)
         the accuracy was selected as metric to evaluate the performance in terms of
         gender identification.




                                                                                                                        187
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)




         3.1      iTACOS experiments
         In our experiments, we addressed both stance and gender detection as a clas-
         sification task. The code is available on github for further exploration and for
         allow reproducibility of our experiments9 . We carried out several experiments10
         by combining both the features introduced in Section 2 together with a set of
         classifiers composed by: Support Vector Machine (SVM), Random Forest (RF),
         Logistic Regression (LG), Decision Tree (DT), and Multinomial Naı̈ve Bayes
         (MNB). Besides, we exploited a Majority Voting (MV) strategy considering the
         different predictions of the above mentioned classifiers as described in Liakata
         et al. [15]. The features we proposed in section 2 were exploited in both the
         tasks of stance and gender detection, but as it will be better described in the
         result section, they were specifically tailored for the sole purpose of detecting
         stance and then they were also applied to gender. For this reason, in the present
         paper we will focus more on the first subtask, that of stance. We analyzed the
         obtained results and selected the five combinations of features that showed the
         best performance for the stance detection task. The resulting sets of features are
         shown in Table 1.
              We participated in the shared task with five different runs for each language
         and each subtask. Table 2 shows the obtained results by using both the features
         and the classifier used in each of the submitted runs.

                        Table 1. Best-ranked sets of features using the training set

                       Name        Features list
                       Set α       BoW, BoL, BoC, Url, BoTM, freqHash, UpW
                       Set β       BoW, BoL, BoP, BoC, Url, BoH, freqHash, Length
                       Set γ       BoW, BoL, BoP, BoC, Url, freqHash, Lan, Length
                       Set δ       BoW, BoL, BoP, BoC, Url, freqHash, PM, Length
                       Set        BoW, BoL, BoP, BoC, Url, BoH, PM, Lan




         3.2      Official results
         We ranked as the first position among 10 participating teams in the subtask of
         stance detection in both Catalan and Spanish. Table 3 shows the official results
         on the test set. At a first glance, it is possible to observe that our proposed
         approach seems to perform slightly better in Catalan than in Spanish. Overall,
         our submissions performed better in Catalan, in fact our five runs ranked among
         the first 8 positions. In Spanish, on the other hand, our less performing run
         ranked as the 18th position.
         As shown in the table above, the best result in each language was not achieved
         by the same run. iTACOS.2 performs better for Catalan, while iTACOS.1 for
             9
                 https://github.com/mirkolai/iTACOS-at-IberEval2017
            10
                 A 10-fold cross-validation setting was used.




                                                                                                                        188
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)




                         Table 2. Results for stance detection on the training set

                       Stance Detection             Gender Detection
               RunFeatures and     F-score     Features and    Accuracy
                    classifier Catalan Spanish   classifier Catalan Spanish
         iTACOS.1 Set α + SVM 0.680 0.544       Set  + LR   0.720 0.648
         iTACOS.2 Set  + LR 0.633 0.544        Set δ + LR   0.722 0.648
         iTACOS.3 Set β + LR 0.625 0.548           5x5∗      0.728 0.656
         iTACOS.4     5x5∗      0.636 0.530    Set α + MV    0.719 0.646
         iTACOS.5 Set α + MV 0.657 0.548 All Sets∗∗ + SVM 0.709 0.636
         ∗
           The final prediction is the most frequent prediction over the 25 combinations
         between sets of features and machine learning algorithms.
         ∗∗
            The final prediction is the most frequent prediction over the 5 combinations
         between sets of features and SVM.


                                Table 3. Official results for stance detection

                              Catalan                                    Spanish
                     Ranking     Run   F-score                  Ranking     Run   F-score
                        1    iTACOS.2 0.4901                       1    iTACOS.1 0.4888
                        2     iTACOS.1 0.4885                      7     iTACOS.2 0.4593
                        4     iTACOS.3 0.4685                     12     iTACOS.3 0.4528
                        7     iTACOS.4 0.4490                     14     iTACOS.4 0.4427
                        8     iTACOS.5 0.4484                     18     iTACOS.5 0.4293



         Spanish. The poorer results in both languages were obtained by using iTA-
         COS.4 and iTACOS.5. As expected the best performing runs (iTACOS.1 and
         iTACOS.2) contain both context-based features, validating the importance of
         considering contextual information in stance detection tasks. For example, both
         runs include the feature Url. We are interested in evaluating the impact of such
         feature on the performance. For this reason, we carried out experiments on the
         training set by applying a modified version of iTACOS.1 and iTACOS.2 re-
         moving the Url feature. Looking at the results, we observed a drop in the per-
         formance of -0.029% for Catalan and of -0.002% for Spanish in iTACOS.1; and
         of −0,004% for Catalan and of -0.002% for Spanish in iTACOS.2.
             The BoTM, a novel feature included in the structure-based group, emerges
         among the relevant features in iTACOS.1 concerning Spanish, but further in-
         quiry on its relevance is matter of future work. For what concerns classifiers, LG
         and SVM achieved the best performance in both languages. Surprisingly, the
         approach exploiting MV is not performing.


         3.3    A linguistic revision

         A fundamental part of our approach has been that of manually dealing with
         data. Being the size of the dataset very large, we were able to visualize only
         a small portion of tweets. Therefore, we focused on the cases of disagreement




                                                                                                                        189
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)




         between the results obtained with iTACOS.1 and the golden labels provided by
         the organizers11 . Below, we report some examples both in Catalan and Spanish:

              1. #elecciones #catalunya #NO #27S https://t.co/oBuTDnUEHj
              → #elecciones #catalunya #NO #27S https://t.co/oBuTDnUEHj
              language: catalan
              golden label: against
              iTACOS.1: favor

              2. Ale @JuntsPelSi, a casa, son solo unas #eleccionescatalanas autonómicas.
              Mañana a trabajar que es lunes. Seguı́s teniendo el mismo DNI. #27S
              → @JuntsPelSi, go at home, there is only one autonomous #eleccionescata-
              lanas. Tomorrow, go to work that it’ll be Monday. You will have the same DNI
              (Spanish ID). #27S
              language: spanish
              golden label: against
              iTACOS.1: favor

              3. En estas #eleccionescatalanas de decide una posible independencia y un gob-
              ierno que vele por los derechos de su pueblo, VOTA @catsiqueespot
              → In these #eleccionescatalanas we decide for a possible independence and a
              government that fights for the rights of its population, VOTE @catsiqueespot
              language: spanish
              golden label: favor
              iTACOS.1: against

         Example 1, has been marked as favor from our classifier in (iTACOS.1), prob-
         ably because of the misleading presence of the token “catalunya”, written in
         Catalan. However, the explicit semantic information carried by the hashtag #NO
         pointing to against was ignored, thus leading to a wrong classification. Con-
         sidering Spanish, example 2 has been appointed as favor instead of against.
         The presence of the mention @JuntsPelSi (Catalan independence coalition) could
         have misdirected our classification. On the other hand, the tweet in example 3
         was tagged as against whereas it should have been favor as we clearly infer
         from “VOTA @catsiqueespot” and according to the golden labels.
             A manual analysis of this kind helped us to shed some light on the relevance
         of each single feature we exploited and, after having linguistically analyzed them,
         to choose which features had to be included in our final sets.


         4        Conclusions

         In this paper we presented an overview of the iTACOS submission for the Stance
         and Gender Detection in Tweets on Catalan Independence task at IberEval-2017.
         We participated by submitting five different runs in the detection of author’s
         stance and gender both in Twitter messages in Catalan and Spanish. Our ap-
         proach, chiefly based on context and structural features, proved to be highly
             11
                  The tweets have been extracted from the training set.




                                                                                                                        190
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)




         successful concerning the task of stance in both languages, as our system ranked
         as the first position among ten participating teams. The results show that the
         addition of two particular features, namely BoTM and Url, produced a signifi-
         cant contribution to Stance Detection task. In the future, we plan to tailor these
         two features we used in an even finer grained manner.


         References
          1. Hogan, B.: The Presentation of Self in the Age of Social Media: Distinguishing
             Performances and Exhibitions Online. Bulletin of Science, Technology & Society
             30 (2010) 377–386
          2. Bethard, S., Cer, D.M., Carpuat, M., Jurgens, D., Nakov, P., Zesch, T., eds.:
             Proceedings of the 10th International Workshop on Semantic Evaluation. In
             Bethard, S., Cer, D.M., Carpuat, M., Jurgens, D., Nakov, P., Zesch, T., eds.:
             SemEval@NAACL-HLT 2016, San Diego, CA, USA, June 16-17, 2016, The As-
             sociation for Computer Linguistics (2016)
          3. Mohammad, S.M., Sobhani, P., Kiritchenko, S.: Stance and Sentiment in Tweets.
             CoRR abs/1605.01655 (2016)
          4. Lai, M., Hernandez Farias, D.I., Patti, V., Rosso, P.: Friends and Enemies of Clin-
             ton and Trump: Using Context for Detecting Stance in Political Tweets. In Sidorov,
             G., Herrera-Alcántara, O., eds.: Part I. Lecture Notes in Artificial Intelligence. Ad-
             vances in Computational Intelligence. 15th Mexican International Conference on
             Artificial Intelligence, MICAI 2016. Volume 10061. (2016) 152–165
          5. Taulé, M., Martı́, M.A., Rangel Pardo, F.M., Rosso, P., Bosco, C., Patti, V.:
             Overview of the task of Stance and Gender Detection in Tweets on Catalan Inde-
             pendence at IBEREVAL 2017. In: Proceedings of the Second Workshop on Eval-
             uation of Human Language Technologies for Iberian Languages (IberEval 2017),
             CEUR Workshop Proceedings. CEUR-WS.org, 2017, Murcia, Spain (2017)
          6. Rangel Pardo, F.M., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview
             of the author profiling task at PAN 2013. In: CLEF Conference on Multilingual
             and Multimodal Information Access Evaluation, CELCT (2013) 352–365
          7. Rangel Pardo, F.M., Rosso, P., Potthast, M., Trenkmann, M., Stein, B., Verhoeven,
             B., Daeleman, W., et al.: Overview of the 2nd author profiling task at pan 2014.
             In: CEUR Workshop Proceedings. Volume 1180., CEUR Workshop Proceedings
             (2014) 898–927
          8. Rangel Pardo, F.M., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview
             of the 3rd Author Profiling Task at PAN 2015. In: CLEF, sn (2015)
          9. Rangel Pardo, F.M., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein,
             B.: Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations.
             In Balog, K., Cappellato, L., Ferro, N., Macdonald, C., eds.: CLEF 2016 Labs
             and Workshops, Notebook Papers. CEUR Workshop Proceedings. Volume 1609.,
             Évora, Portugal (2016) 750–784
         10. Rangel Pardo, F.M., Rosso, P.: Use of language and author profiling: Identification
             of gender and age. Natural Language Processing and Cognitive Science 177 (2013)
         11. Schmid, H.: Part-of-speech tagging with neural networks. In: Proceedings of the
             15th conference on Computational linguistics-Volume 1, Association for Compu-
             tational Linguistics (1994) 172–176
         12. Schmid, H.: Treetagger— a language independent part-of-speech tagger. Institut
             für Maschinelle Sprachverarbeitung, Universität Stuttgart 43 (1995) 28




                                                                                                                        191
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)




         13. Bosco, C., Lai, M., Patti, V., Rangel Pardo, F.M., Rosso, P.: Tweeting in the
             Debate about Catalan Elections. In Calzolari, N., Choukri, K., Declerck, T., Goggi,
             S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J.,
             Piperidis, S., eds.: LREC workshop on Emotion and Sentiment Analysis Workshop
             (ESA), LREC-2016, Portorož, Slovenia, European Language Resources Association
             (ELRA) (2016) 67–70
         14. Millar, R.: Language, Nation and Power: An Introduction. Springer (2005)
         15. Liakata, M., Kim, J.H., Saha, S., Hastings, J., Rebholz-Schuhmann, D.: Three
             hybrid classifiers for the detection of emotions in suicide notes. Biomedical infor-
             matics insights 5 (2012) 175




                                                                                                                        192