Stance Detection in Turkish Tweets Dilek Küçük Electrical Power Technologies Group TÜBİTAK Energy Institute Ankara, Turkey dilek.kucuk@tubitak.gov.tr ABSTRACT Stance detection is usually considered as a subtask of sentiment Stance detection is a classification problem in natural language pro- analysis (opinion mining) [13] topic in NLP. Both are mostly per- cessing where for a text and target pair, a class result from the set formed on social media texts, particularly on tweets, hence both {Favor, Against, Neither} is expected. It is similar to the sentiment are important components of social media analysis. Nevertheless, analysis problem but instead of the sentiment of the text author, in sentiment analysis, the sentiment of the author of a piece of the stance expressed for a particular target is investigated in stance text usually as Positive, Negative, and Neutral is explored while in detection. In this paper, we present a stance detection tweet data stance detection, the stance of the author of the text for a particular set for Turkish comprising stance annotations of these tweets for target (an entity, event, etc.) either explicitly or implicitly referred two popular sports clubs as targets. Additionally, we provide the to in the text is considered. Like sentiment analysis, stance detec- evaluation results of SVM classifiers for each target on this data set, tion systems can be valuable components of information retrieval where the classifiers use unigram, bigram, and hashtag features. and other text analysis systems [12]. This study is significant as it presents one of the initial stance de- Previous work on stance detection include [16] where a stance tection data sets proposed so far and the first one for Turkish lan- classifier based on sentiment and arguing features is proposed in guage, to the best of our knowledge. The data set and the evalua- addition to an arguing lexicon automatically compiled. The ulti- tion results of the corresponding SVM-based approaches will form mate approach performs better than distribution-based and uni- plausible baselines for the comparison of future studies on stance gram-based baseline systems [16]. In [17], the authors show that detection. the use of dialogue structure improves stance detection in on-line debates. In [7], Hasan and Ng carry out stance detection experi- CCS CONCEPTS ments using different machine learning algorithms, training data sets, features, and inter-post constraints in on-line debates, and • Information systems → Sentiment analysis; Web and so- draw insightful conclusions based on these experiments. For in- cial media search; • Computing methodologies → Language stance, they find that sequence models like HMMs perform better resources; at stance detection when compared with non-sequence models like Naive Bayes (NB) [7]. In another related study [10], the authors KEYWORDS conclude that topic-independent features can be exploited for dis- Stance detection, Turkish, social media analysis, SVM, unigrams agreement detection in on-line dialogues. The employed features include agreement, cue words, denial, hedges, duration, polarity, Reference format: and punctuation [10]. Stance detection on a corpus of student es- Dilek Küçük. 2017. Stance Detection in Turkish Tweets. In Proceedings of Workshop on Social Media World Sensors (SIDEWAYS), Prague, Czech says is considered in [5]. After using linguistically-motivated fea- Repub-lic, July 2017 (SIDEWAYS’17), 4 pages, CEUR-WS.org. ture sets together with multivalued NB and SVM as the learning models, the authors conclude that they outperform two baseline approaches [5]. In [4], the author claims that Wikipedia can be used to determine stances about controversial topics based on their 1 INTRODUCTION previous work regarding controversy extraction on the Web. Stance detection (also called stance identification or stance classifi- Among more recent related work, in [1] stance detection for cation) is one of the considerably recent research topics in natural unseen targets is studied and bidirectional conditional encoding is language processing (NLP). It is usually defined as a classification employed. The authors state that their approach achieves state-of- problem where for a text and target pair, the stance of the author the art performance rates [1] on SemEval 2016 Twitter Stance De- of the text for that target is expected as a classification output from tection corpus [12]. In [3], a stance-community detection approach the set: {Favor, Against, Neither} [12]. called SCIFNET is proposed. SCIFNET creates networks of people who are stance targets, automatically from the related document collections [3] using stance expansion and refinement techniques to arrive at stance-coherent networks. A tweet data set annotated with stance information regarding six predefined targets is pro- posed in [11] where this data set is annotated through crowdsourc- SIDEWAYS’17, Prague, Czech Republic ing. The authors indicate that the data set is also annotated with Copyright held by the author(s). sentiment information in addition to stance, so it can help reveal SIDEWAYS’17, July 2017, Prague, Czech Republic D. Küçük associations between stance and sentiment [11]. Lastly, in [12], Se- At the end of the annotation process, we have annotated 700 mEval 2016’s aforementioned shared task on Twitter Stance Detec- tweets, where 175 tweets are in favor of and 175 tweets are against tion is described. Also provided are the results of the evaluations Target-1, and similarly 175 tweets are in favor of and 175 are against of 19 systems participating in two subtasks (one with training data Target-2. Hence, our data set is a balanced one although it is cur- set provided and the other without an annotated data set) of the rently limited in size. The corresponding stance annotations are shared task [12]. made publicly available at http://ceng.metu.edu.tr/∼e120329/ In this paper, we present a tweet data set in Turkish annotated Turkish_Stance_Detection_Tweet_Dataset.csv in Comma Sep- with stance information, where the corresponding annotations are arated Values (CSV) format. The file contains three columns with made publicly available. The domain of the tweets comprises two the corresponding headers. The first column is the tweet id of the popular football clubs which constitute the targets of the tweets corresponding tweet, the second column contains the name of the included. We also provide the evaluation results of SVM classifiers stance target, and the last column includes the stance of the tweet (for each target) on this data set using unigram, bigram, and hash- for the target as Favor or Against. tag features. To the best of our knowledge, this is the first publicly-available To the best of our knowledge, the current study is the first one to stance-annotated data set for Turkish. Hence, it is a significant target at stance detection in Turkish tweets. Together with the pro- resource as there is a scarcity of annotated data sets, linguistic vided annotated data set and the corresponding evaluations with resources, and NLP tools available for Turkish. Additionally, to the aforementioned SVM classifiers which can be used as baseline the best of our knowledge, it is also significant for being the first systems, our study will hopefully help increase social media anal- stance-annotated data set including sports-related tweets, as pre- ysis studies on Turkish content. vious stance detection data sets mostly include on-line texts on The rest of the paper is organized as follows: In Section 2, we political/ethical issues. describe our tweet data set annotated with the target and stance in- formation. Section 3 includes the details of our SVM-based stance 3 STANCE DETECTION EXPERIMENTS classifiers and their evaluation results with discussions. Section 4 USING SVM CLASSIFIERS includes future research topics based on the current study, and fi- It is emphasized in the related literature that unigram-based meth- nally Section 5 concludes the paper with a summary. ods are reliable for the stance detection task [16] and similarly unigram-based models have been used as baseline models in stud- ies such as [12]. In order to be used as a baseline and reference 2 A STANCE DETECTION DATA SET system for further studies on stance detection in Turkish tweets, We have decided to consider tweets about popular sports clubs as we have trained two SVM classifiers (one for each target) using our domain for stance detection. Considerable amounts of tweets unigrams as features. Before the extraction of unigrams, we have are being published for sports-related events at every instant. Hence employed automated preprocessing to filter out the stopwords in we have determined our targets as Galatasaray (namely Target-1) our annotated data set of 700 tweets. The stopword list used is the and Fenerbahçe (namely, Target-2) which are two of the most pop- list presented in [8] which, in turn, is the slightly extended version ular football clubs in Turkey. As is the case for the sentiment anal- of the stopword list provided in [2]. ysis tools, the outputs of the stance detection systems on a stream We have used the SVM implementation available in the Weka of tweets about these clubs can facilitate the use of the opinions of data mining application [6] where this particular implementation the football followers by these clubs. employs the SMO algorithm [14] to train a classifier with a linear In a previous study on the identification of public health-related kernel. The 10-fold cross-validation results of the two classifiers tweets, two tweet data sets in Turkish (each set containing 1 mil- are provided in Table 1 using the metrics of precision, recall, and lion random tweets) have been compiled where these sets belong F-Measure. to two different periods of 20 consecutive days [9]. We have de- cided to use one of these sets (corresponding to the period between Table 1: Evaluation Results of the Unigram-based SVM Clas- August 18 and September 6, 2015) and firstly filtered the tweets us- sifiers ing the possible names used to refer to the target clubs. Then, we have annotated the stance information in the tweets for these tar- Target Class Precision (%) Recall (%) F-Measure (%) gets as Favor or Against. Within the course of this study, we have Favor 75.2 92.0 82.8 not considered those tweets in which the target is not explicitly Target-1 Against 89.7 69.7 78.5 mentioned, as our initial filtering process reveals. Average 82.5 80.9 80.6 For the purposes of the current study, we have not annotated any tweets with the Neither class. This stance class and even finer- Favor 68.5 83.4 75.3 grained classes can be considered in further annotation studies. We Target-2 Against 78.8 61.7 69.2 should also note that in a few tweets, the target of the stance was Average 73.7 72.6 72.2 the management of the club while in some others a particular foot- baller of the club is praised or criticised. Still, we have considered The evaluation results are quite favorable for both targets and the club as the target of the stance in all of the cases and carried particularly higher for Target-1, considering the fact that they are out our annotations accordingly. the initial experiments on the data set. The performance of the Stance Detection in Turkish Tweets SIDEWAYS’17, July 2017, Prague, Czech Republic classifiers is better for the Favor class for both targets when com- is observed, while the overall F-Measure value for Target-2 has in- pared with the performance results for the Against class. This out- creased by 1.8%. Although we could not derive sound conclusions come may be due to the common use of some terms when express- mainly due to the relatively small size of our data set, the increase ing positive stance towards sports clubs in Turkish tweets. The in the performance of the SVM classifier Target-2 is an encourag- same percentage of common terms may not have been observed ing evidence for the exploitation of hashtags in a stance detection in tweets during the expression of negative stances towards the system. We leave other ways of exploiting hashtags for stance de- targets. Yet, completely the opposite pattern is observed in stance tection as a future work. detection results of baseline systems given in [12], i.e., better F- To sum up, our evaluation results are significant as reference Measure rates have been obtained for the Against class when com- results to be used for comparison purposes and provides evidence pared with the Favor class [12]. Some of the baseline systems re- for the utility of unigram-based and hashtag-related features in ported in [12] are SVM-based systems using unigrams and ngrams SVM classifiers for the stance detection problem in Turkish tweets. as features similar to our study, but their data sets include all three stance classes of Favor, Against, and Neither, while our data set comprises only tweets classified as belonging to Favor or Against 4 FUTURE PROSPECTS classes. Another difference is that the data sets in [12] have been Future work based on the current study includes the following: divided into training and test sets, while in our study we provide 10-fold cross-validation results on the whole data set. On the other • The presented stance-annotated data set for Turkish has hand, we should also note that SVM-based sentiment analysis sys- been created by one annotator only (the author of this tems (such as those given in [15]) have been reported to achieve study), yet, the data set should better be revised and ex- better F-Measure rates for the Positive sentiment class when com- tended through crowdsourcing facilities. When employ- pared with the results obtained for the Negative class. Therefore, ing such a procedure, other stance classes like Neither can our evaluation results for each stance class seem to be in line with be considered as well. The procedure will improve the qual- such sentiment analysis systems. Yet, further experiments on the ity the data set as well as the quality of prospective sys- extended versions of our data set should be conducted and the re- tems to be trained and tested on it. sults should again be compared with the stance detection results • Other features like emoticons (as commonly used for sen- given in the literature. timent analysis), features based on hashtags, and ngram We have also evaluated SVM classifiers which use only bigrams features can also be used by the classifiers and these clas- as features, as ngram-based classifiers have been reported to per- sifiers can be tested on larger data sets. Other classification form better for the stance detection problem [12]. However, we approaches could also be implemented and tested against have observed that using bigrams as the sole features of the SVM our baseline classifiers. Particularly, related methods pre- classifiers leads to quite poor results. This observation may be due sented in recent studies such as [12] can be tested on our to the relatively limited size of the tweet data set employed. Still, data set. we can conclude that unigram-based features lead to superior re- • Lastly, the SVM classifiers utilized in this study and their sults compared to the results obtained using bigrams as features, prospective versions utilizing other features can be tested based on our experiments on our data set. Yet, ngram-based fea- on stance data sets in other languages (such as English) tures may be employed on the extended versions of the data set to for comparison purposes. verify this conclusion within the course of future work. With an intention to exploit the contribution of hashtag use to 5 CONCLUSION stance detection, we have also used the existence of hashtags in tweets as an additional feature to unigrams. The corresponding Stance detection is a considerably new research area in natural lan- evaluation results of the SVM classifiers using unigrams together guage processing and is considered within the scope of the well- the existence of hashtags as features are provided in Table 2. studied topic of sentiment analysis. It is the detection of stance within text towards a target which may be explicitly specified in the text or not. In this study, we present a stance-annotated tweet Table 2: Evaluation Results of the SVM Classifiers Utilizing data set in Turkish where the targets of the annotated stances are Unigrams and Hashtag Use as Features two popular sports clubs in Turkey. The corresponding annota- tions are made publicly-available for research purposes. To the best Target Class Precision (%) Recall (%) F-Measure (%) of our knowledge, this is the first stance detection data set for the Favor 75.0 90.9 82.2 Turkish language and also the first sports-related stance-annotated Target-1 Against 88.4 69.7 78.0 data set. Also presented in this study are SVM classifiers (one for Average 81.7 80.3 80.1 each target) utilizing unigram and bigram features in addition to Favor 70.0 85.1 76.8 using the existence of hashtags as another feature. 10-fold cross Target-2 Against 81.0 63.4 71.2 validation results of these classifiers are presented which can be Average 75.5 74.3 74.0 used as reference results by prospective systems. Both the anno- tated data set and the classifiers with evaluations are significant When the results given in Table 2 are compared with the re- since they are the initial contributions to stance detection problem sults in Table 1, a slight decrease in F-Measure (0.5%) for Target-1 in Turkish tweets. SIDEWAYS’17, July 2017, Prague, Czech Republic D. Küçük REFERENCES [1] Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, and Kalina Bontcheva. 2016. Stance detection with bidirectional conditional encoding. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). [2] Fazli Can, Seyit Kocberber, Erman Balcik, Cihan Kaynak, H Cagdas Ocalan, and Onur M Vursavas. 2008. Information retrieval on Turkish texts. Journal of the American Society for Information Science and Technology 59, 3 (2008), 407–421. [3] Zhong-Yong Chen and Chien Chin Chen. 2016. SCIFNET: Stance community identification of topic persons using friendship network analysis. Knowledge- Based Systems 110 (2016), 30–48. [4] Shiri Dori-Hacohen. 2015. Controversy Detection and Stance Analysis. In Pro- ceedings of the 38th International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval. 1057–1057. [5] Adam Faulkner. 2014. Automated classification of stance in student essays: An approach using stance target information and the Wikipedia link-based mea- sure. In Proceedings of the Twenty-Seventh International Florida Artificial Intelli- gence Research Society Conference. [6] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. 2009. The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter 1, 1 (2009), 10–18. [7] Kazi Saidul Hasan and Vincent Ng. 2013. Stance Classification of Ideological Debates: Data, Models, Features, and Constraints. In Proceedings of the Sixth International Joint Conference on Natural Language Processing. 1348–1356. [8] Dilek Küçük. 2011. Exploiting Information Extraction Techniques for Automatic Semantic Annotation and Retrieval of News Videos in Turkish. Ph.D. Dissertation. Middle East Technical University. [9] Emine Ela Küçük, Kürşad Yapar, Dilek Küçük, and Doğan Küçük. 2017. Ontology-based automatic identification of public health-related Turkish tweets. Computers in Biology and Medicine 83 (2017), 1–9. [10] Amita Misra and Marilyn A Walker. 2013. Topic independent identification of agreement and disagreement in social media dialogue. In Conference of the Special Interest Group on Discourse and Dialogue. 920. [11] Saif M Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. 2016. A dataset for detecting stance in tweets. In Proceedings of the Language Resources and Evaluation Conference (LREC). [12] Saif M Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. 2016. Semeval-2016 task 6: Detecting stance in tweets. In Proceed- ings of the International Workshop on Semantic Evaluation, SemEval. [13] Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Founda- tions and Trends in Information Retrieval 2, 1-2 (2008), 1–135. [14] John C. Platt. 1999. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Advances in Kernel Methods (1999), 185–208. [15] Hamid Poursepanj, Josh Weissbock, and Diana Inkpen. 2013. uOttawa: System description for SemEval 2013 Task 2 Sentiment Analysis in Twitter. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Sev- enth International Workshop on Semantic Evaluation (SemEval 2013). 380–383. [16] Swapna Somasundaran and Janyce Wiebe. 2010. Recognizing stances in ideo- logical on-line debates. In Proceedings of the NAACL HLT Workshop on Compu- tational Approaches to Analysis and Generation of Emotion in Text. 116–124. [17] Marilyn A Walker, Pranav Anand, Robert Abbott, and Ricky Grant. 2012. Stance classification using dialogic properties of persuasion. In Proceedings of the Con- ference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies. 592–596.