Stance Detection in Turkish Tweets
                                                                  Dilek Küçük
                                                     Electrical Power Technologies Group
                                                          TÜBİTAK Energy Institute
                                                                 Ankara, Turkey
                                                          dilek.kucuk@tubitak.gov.tr

ABSTRACT                                                                      Stance detection is usually considered as a subtask of sentiment
Stance detection is a classification problem in natural language pro-      analysis (opinion mining) [13] topic in NLP. Both are mostly per-
cessing where for a text and target pair, a class result from the set      formed on social media texts, particularly on tweets, hence both
{Favor, Against, Neither} is expected. It is similar to the sentiment      are important components of social media analysis. Nevertheless,
analysis problem but instead of the sentiment of the text author,          in sentiment analysis, the sentiment of the author of a piece of
the stance expressed for a particular target is investigated in stance     text usually as Positive, Negative, and Neutral is explored while in
detection. In this paper, we present a stance detection tweet data         stance detection, the stance of the author of the text for a particular
set for Turkish comprising stance annotations of these tweets for          target (an entity, event, etc.) either explicitly or implicitly referred
two popular sports clubs as targets. Additionally, we provide the          to in the text is considered. Like sentiment analysis, stance detec-
evaluation results of SVM classifiers for each target on this data set,    tion systems can be valuable components of information retrieval
where the classifiers use unigram, bigram, and hashtag features.           and other text analysis systems [12].
This study is significant as it presents one of the initial stance de-        Previous work on stance detection include [16] where a stance
tection data sets proposed so far and the first one for Turkish lan-       classifier based on sentiment and arguing features is proposed in
guage, to the best of our knowledge. The data set and the evalua-          addition to an arguing lexicon automatically compiled. The ulti-
tion results of the corresponding SVM-based approaches will form           mate approach performs better than distribution-based and uni-
plausible baselines for the comparison of future studies on stance         gram-based baseline systems [16]. In [17], the authors show that
detection.                                                                 the use of dialogue structure improves stance detection in on-line
                                                                           debates. In [7], Hasan and Ng carry out stance detection experi-
CCS CONCEPTS                                                               ments using different machine learning algorithms, training data
                                                                           sets, features, and inter-post constraints in on-line debates, and
• Information systems → Sentiment analysis; Web and so-                    draw insightful conclusions based on these experiments. For in-
cial media search; • Computing methodologies → Language                    stance, they find that sequence models like HMMs perform better
resources;                                                                 at stance detection when compared with non-sequence models like
                                                                           Naive Bayes (NB) [7]. In another related study [10], the authors
KEYWORDS                                                                   conclude that topic-independent features can be exploited for dis-
Stance detection, Turkish, social media analysis, SVM, unigrams            agreement detection in on-line dialogues. The employed features
                                                                           include agreement, cue words, denial, hedges, duration, polarity,
Reference format:
                                                                           and punctuation [10]. Stance detection on a corpus of student es-
Dilek Küçük. 2017. Stance Detection in Turkish Tweets. In Proceedings of
Workshop on Social Media World Sensors (SIDEWAYS), Prague, Czech
                                                                           says is considered in [5]. After using linguistically-motivated fea-
Repub-lic, July 2017 (SIDEWAYS’17), 4 pages, CEUR-WS.org.                  ture sets together with multivalued NB and SVM as the learning
                                                                           models, the authors conclude that they outperform two baseline
                                                                           approaches [5]. In [4], the author claims that Wikipedia can be
                                                                           used to determine stances about controversial topics based on their
1 INTRODUCTION                                                             previous work regarding controversy extraction on the Web.
Stance detection (also called stance identification or stance classifi-       Among more recent related work, in [1] stance detection for
cation) is one of the considerably recent research topics in natural       unseen targets is studied and bidirectional conditional encoding is
language processing (NLP). It is usually defined as a classification       employed. The authors state that their approach achieves state-of-
problem where for a text and target pair, the stance of the author         the art performance rates [1] on SemEval 2016 Twitter Stance De-
of the text for that target is expected as a classification output from    tection corpus [12]. In [3], a stance-community detection approach
the set: {Favor, Against, Neither} [12].                                   called SCIFNET is proposed. SCIFNET creates networks of people
                                                                           who are stance targets, automatically from the related document
                                                                           collections [3] using stance expansion and refinement techniques
                                                                           to arrive at stance-coherent networks. A tweet data set annotated
                                                                           with stance information regarding six predefined targets is pro-
                                                                           posed in [11] where this data set is annotated through crowdsourc-
SIDEWAYS’17, Prague, Czech Republic                                        ing. The authors indicate that the data set is also annotated with
Copyright held by the author(s).                                           sentiment information in addition to stance, so it can help reveal
SIDEWAYS’17, July 2017, Prague, Czech Republic                                                                                      D. Küçük


associations between stance and sentiment [11]. Lastly, in [12], Se-        At the end of the annotation process, we have annotated 700
mEval 2016’s aforementioned shared task on Twitter Stance Detec-         tweets, where 175 tweets are in favor of and 175 tweets are against
tion is described. Also provided are the results of the evaluations      Target-1, and similarly 175 tweets are in favor of and 175 are against
of 19 systems participating in two subtasks (one with training data      Target-2. Hence, our data set is a balanced one although it is cur-
set provided and the other without an annotated data set) of the         rently limited in size. The corresponding stance annotations are
shared task [12].                                                        made publicly available at http://ceng.metu.edu.tr/∼e120329/
   In this paper, we present a tweet data set in Turkish annotated       Turkish_Stance_Detection_Tweet_Dataset.csv in Comma Sep-
with stance information, where the corresponding annotations are         arated Values (CSV) format. The file contains three columns with
made publicly available. The domain of the tweets comprises two          the corresponding headers. The first column is the tweet id of the
popular football clubs which constitute the targets of the tweets        corresponding tweet, the second column contains the name of the
included. We also provide the evaluation results of SVM classifiers      stance target, and the last column includes the stance of the tweet
(for each target) on this data set using unigram, bigram, and hash-      for the target as Favor or Against.
tag features.                                                               To the best of our knowledge, this is the first publicly-available
   To the best of our knowledge, the current study is the first one to   stance-annotated data set for Turkish. Hence, it is a significant
target at stance detection in Turkish tweets. Together with the pro-     resource as there is a scarcity of annotated data sets, linguistic
vided annotated data set and the corresponding evaluations with          resources, and NLP tools available for Turkish. Additionally, to
the aforementioned SVM classifiers which can be used as baseline         the best of our knowledge, it is also significant for being the first
systems, our study will hopefully help increase social media anal-       stance-annotated data set including sports-related tweets, as pre-
ysis studies on Turkish content.                                         vious stance detection data sets mostly include on-line texts on
   The rest of the paper is organized as follows: In Section 2, we       political/ethical issues.
describe our tweet data set annotated with the target and stance in-
formation. Section 3 includes the details of our SVM-based stance        3 STANCE DETECTION EXPERIMENTS
classifiers and their evaluation results with discussions. Section 4       USING SVM CLASSIFIERS
includes future research topics based on the current study, and fi-
                                                                         It is emphasized in the related literature that unigram-based meth-
nally Section 5 concludes the paper with a summary.
                                                                         ods are reliable for the stance detection task [16] and similarly
                                                                         unigram-based models have been used as baseline models in stud-
                                                                         ies such as [12]. In order to be used as a baseline and reference
2 A STANCE DETECTION DATA SET                                            system for further studies on stance detection in Turkish tweets,
We have decided to consider tweets about popular sports clubs as         we have trained two SVM classifiers (one for each target) using
our domain for stance detection. Considerable amounts of tweets          unigrams as features. Before the extraction of unigrams, we have
are being published for sports-related events at every instant. Hence    employed automated preprocessing to filter out the stopwords in
we have determined our targets as Galatasaray (namely Target-1)          our annotated data set of 700 tweets. The stopword list used is the
and Fenerbahçe (namely, Target-2) which are two of the most pop-         list presented in [8] which, in turn, is the slightly extended version
ular football clubs in Turkey. As is the case for the sentiment anal-    of the stopword list provided in [2].
ysis tools, the outputs of the stance detection systems on a stream          We have used the SVM implementation available in the Weka
of tweets about these clubs can facilitate the use of the opinions of    data mining application [6] where this particular implementation
the football followers by these clubs.                                   employs the SMO algorithm [14] to train a classifier with a linear
   In a previous study on the identification of public health-related    kernel. The 10-fold cross-validation results of the two classifiers
tweets, two tweet data sets in Turkish (each set containing 1 mil-       are provided in Table 1 using the metrics of precision, recall, and
lion random tweets) have been compiled where these sets belong           F-Measure.
to two different periods of 20 consecutive days [9]. We have de-
cided to use one of these sets (corresponding to the period between
                                                                         Table 1: Evaluation Results of the Unigram-based SVM Clas-
August 18 and September 6, 2015) and firstly filtered the tweets us-
                                                                         sifiers
ing the possible names used to refer to the target clubs. Then, we
have annotated the stance information in the tweets for these tar-
                                                                          Target      Class       Precision (%)    Recall (%)   F-Measure (%)
gets as Favor or Against. Within the course of this study, we have
                                                                                      Favor           75.2           92.0           82.8
not considered those tweets in which the target is not explicitly
                                                                          Target-1    Against         89.7           69.7           78.5
mentioned, as our initial filtering process reveals.
                                                                                      Average         82.5           80.9           80.6
   For the purposes of the current study, we have not annotated
any tweets with the Neither class. This stance class and even finer-                  Favor           68.5           83.4           75.3
grained classes can be considered in further annotation studies. We       Target-2    Against         78.8           61.7           69.2
should also note that in a few tweets, the target of the stance was                   Average         73.7           72.6           72.2
the management of the club while in some others a particular foot-
baller of the club is praised or criticised. Still, we have considered
                                                                            The evaluation results are quite favorable for both targets and
the club as the target of the stance in all of the cases and carried
                                                                         particularly higher for Target-1, considering the fact that they are
out our annotations accordingly.
                                                                         the initial experiments on the data set. The performance of the
Stance Detection in Turkish Tweets                                                         SIDEWAYS’17, July 2017, Prague, Czech Republic


classifiers is better for the Favor class for both targets when com-      is observed, while the overall F-Measure value for Target-2 has in-
pared with the performance results for the Against class. This out-       creased by 1.8%. Although we could not derive sound conclusions
come may be due to the common use of some terms when express-             mainly due to the relatively small size of our data set, the increase
ing positive stance towards sports clubs in Turkish tweets. The           in the performance of the SVM classifier Target-2 is an encourag-
same percentage of common terms may not have been observed                ing evidence for the exploitation of hashtags in a stance detection
in tweets during the expression of negative stances towards the           system. We leave other ways of exploiting hashtags for stance de-
targets. Yet, completely the opposite pattern is observed in stance       tection as a future work.
detection results of baseline systems given in [12], i.e., better F-         To sum up, our evaluation results are significant as reference
Measure rates have been obtained for the Against class when com-          results to be used for comparison purposes and provides evidence
pared with the Favor class [12]. Some of the baseline systems re-         for the utility of unigram-based and hashtag-related features in
ported in [12] are SVM-based systems using unigrams and ngrams            SVM classifiers for the stance detection problem in Turkish tweets.
as features similar to our study, but their data sets include all three
stance classes of Favor, Against, and Neither, while our data set
comprises only tweets classified as belonging to Favor or Against         4 FUTURE PROSPECTS
classes. Another difference is that the data sets in [12] have been       Future work based on the current study includes the following:
divided into training and test sets, while in our study we provide
10-fold cross-validation results on the whole data set. On the other            • The presented stance-annotated data set for Turkish has
hand, we should also note that SVM-based sentiment analysis sys-                  been created by one annotator only (the author of this
tems (such as those given in [15]) have been reported to achieve                  study), yet, the data set should better be revised and ex-
better F-Measure rates for the Positive sentiment class when com-                 tended through crowdsourcing facilities. When employ-
pared with the results obtained for the Negative class. Therefore,                ing such a procedure, other stance classes like Neither can
our evaluation results for each stance class seem to be in line with              be considered as well. The procedure will improve the qual-
such sentiment analysis systems. Yet, further experiments on the                  ity the data set as well as the quality of prospective sys-
extended versions of our data set should be conducted and the re-                 tems to be trained and tested on it.
sults should again be compared with the stance detection results                • Other features like emoticons (as commonly used for sen-
given in the literature.                                                          timent analysis), features based on hashtags, and ngram
   We have also evaluated SVM classifiers which use only bigrams                  features can also be used by the classifiers and these clas-
as features, as ngram-based classifiers have been reported to per-                sifiers can be tested on larger data sets. Other classification
form better for the stance detection problem [12]. However, we                    approaches could also be implemented and tested against
have observed that using bigrams as the sole features of the SVM                  our baseline classifiers. Particularly, related methods pre-
classifiers leads to quite poor results. This observation may be due              sented in recent studies such as [12] can be tested on our
to the relatively limited size of the tweet data set employed. Still,             data set.
we can conclude that unigram-based features lead to superior re-                • Lastly, the SVM classifiers utilized in this study and their
sults compared to the results obtained using bigrams as features,                 prospective versions utilizing other features can be tested
based on our experiments on our data set. Yet, ngram-based fea-                   on stance data sets in other languages (such as English)
tures may be employed on the extended versions of the data set to                 for comparison purposes.
verify this conclusion within the course of future work.
   With an intention to exploit the contribution of hashtag use to        5 CONCLUSION
stance detection, we have also used the existence of hashtags in
tweets as an additional feature to unigrams. The corresponding            Stance detection is a considerably new research area in natural lan-
evaluation results of the SVM classifiers using unigrams together         guage processing and is considered within the scope of the well-
the existence of hashtags as features are provided in Table 2.            studied topic of sentiment analysis. It is the detection of stance
                                                                          within text towards a target which may be explicitly specified in
                                                                          the text or not. In this study, we present a stance-annotated tweet
Table 2: Evaluation Results of the SVM Classifiers Utilizing              data set in Turkish where the targets of the annotated stances are
Unigrams and Hashtag Use as Features                                      two popular sports clubs in Turkey. The corresponding annota-
                                                                          tions are made publicly-available for research purposes. To the best
 Target      Class       Precision (%)    Recall (%)   F-Measure (%)      of our knowledge, this is the first stance detection data set for the
             Favor           75.0           90.9           82.2           Turkish language and also the first sports-related stance-annotated
 Target-1    Against         88.4           69.7           78.0           data set. Also presented in this study are SVM classifiers (one for
             Average         81.7           80.3           80.1           each target) utilizing unigram and bigram features in addition to
             Favor           70.0           85.1           76.8           using the existence of hashtags as another feature. 10-fold cross
 Target-2    Against         81.0           63.4           71.2           validation results of these classifiers are presented which can be
             Average         75.5           74.3           74.0           used as reference results by prospective systems. Both the anno-
                                                                          tated data set and the classifiers with evaluations are significant
   When the results given in Table 2 are compared with the re-            since they are the initial contributions to stance detection problem
sults in Table 1, a slight decrease in F-Measure (0.5%) for Target-1      in Turkish tweets.
SIDEWAYS’17, July 2017, Prague, Czech Republic                                              D. Küçük


REFERENCES
 [1] Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, and Kalina Bontcheva.
     2016. Stance detection with bidirectional conditional encoding. In Proceedings of
     the Conference on Empirical Methods in Natural Language Processing (EMNLP).
 [2] Fazli Can, Seyit Kocberber, Erman Balcik, Cihan Kaynak, H Cagdas Ocalan, and
     Onur M Vursavas. 2008. Information retrieval on Turkish texts. Journal of the
     American Society for Information Science and Technology 59, 3 (2008), 407–421.
 [3] Zhong-Yong Chen and Chien Chin Chen. 2016. SCIFNET: Stance community
     identification of topic persons using friendship network analysis. Knowledge-
     Based Systems 110 (2016), 30–48.
 [4] Shiri Dori-Hacohen. 2015. Controversy Detection and Stance Analysis. In Pro-
     ceedings of the 38th International ACM SIGIR Conference on Research and Devel-
     opment in Information Retrieval. 1057–1057.
 [5] Adam Faulkner. 2014. Automated classification of stance in student essays: An
     approach using stance target information and the Wikipedia link-based mea-
     sure. In Proceedings of the Twenty-Seventh International Florida Artificial Intelli-
     gence Research Society Conference.
 [6] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten.
     2009. The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations
     Newsletter 1, 1 (2009), 10–18.
 [7] Kazi Saidul Hasan and Vincent Ng. 2013. Stance Classification of Ideological
     Debates: Data, Models, Features, and Constraints. In Proceedings of the Sixth
     International Joint Conference on Natural Language Processing. 1348–1356.
 [8] Dilek Küçük. 2011. Exploiting Information Extraction Techniques for Automatic
     Semantic Annotation and Retrieval of News Videos in Turkish. Ph.D. Dissertation.
     Middle East Technical University.
 [9] Emine Ela Küçük, Kürşad Yapar, Dilek Küçük, and Doğan Küçük. 2017.
     Ontology-based automatic identification of public health-related Turkish tweets.
     Computers in Biology and Medicine 83 (2017), 1–9.
[10] Amita Misra and Marilyn A Walker. 2013. Topic independent identification
     of agreement and disagreement in social media dialogue. In Conference of the
     Special Interest Group on Discourse and Dialogue. 920.
[11] Saif M Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and
     Colin Cherry. 2016. A dataset for detecting stance in tweets. In Proceedings of
     the Language Resources and Evaluation Conference (LREC).
[12] Saif M Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and
     Colin Cherry. 2016. Semeval-2016 task 6: Detecting stance in tweets. In Proceed-
     ings of the International Workshop on Semantic Evaluation, SemEval.
[13] Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Founda-
     tions and Trends in Information Retrieval 2, 1-2 (2008), 1–135.
[14] John C. Platt. 1999. Fast Training of Support Vector Machines Using Sequential
     Minimal Optimization. Advances in Kernel Methods (1999), 185–208.
[15] Hamid Poursepanj, Josh Weissbock, and Diana Inkpen. 2013. uOttawa: System
     description for SemEval 2013 Task 2 Sentiment Analysis in Twitter. In Second
     Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Sev-
     enth International Workshop on Semantic Evaluation (SemEval 2013). 380–383.
[16] Swapna Somasundaran and Janyce Wiebe. 2010. Recognizing stances in ideo-
     logical on-line debates. In Proceedings of the NAACL HLT Workshop on Compu-
     tational Approaches to Analysis and Generation of Emotion in Text. 116–124.
[17] Marilyn A Walker, Pranav Anand, Robert Abbott, and Ricky Grant. 2012. Stance
     classification using dialogic properties of persuasion. In Proceedings of the Con-
     ference of the North American Chapter of the Association for Computational Lin-
     guistics: Human Language Technologies. 592–596.