=Paper=
{{Paper
|id=Vol-1885/176
|storemode=property
|title=Detecting Stance in Czech News Commentaries
|pdfUrl=https://ceur-ws.org/Vol-1885/176.pdf
|volume=Vol-1885
|authors=Tomáš Hercig,Peter Krejzl,Barbora Hourová,Josef Steinberger,Ladislav Lenc
|dblpUrl=https://dblp.org/rec/conf/itat/HercigKHSL17
}}
==Detecting Stance in Czech News Commentaries==
J. Hlaváčová (Ed.): ITAT 2017 Proceedings, pp. 176–180
CEUR Workshop Proceedings Vol. 1885, ISSN 1613-0073, c 2017 T. Hercig, P. Krejzl, B. Hourová, J. Steinberger, L. Lenc
Detecting Stance in Czech News Commentaries
Tomáš Hercig1,2 , Peter Krejzl1 , Barbora Hourová1 , Josef Steinberger1 , Ladislav Lenc1,2
1Department of Computer Science and Engineering, Faculty of Applied Sciences,
University of West Bohemia, Univerzitní 8, 306 14 Plzeň, Czech Republic
2 NTIS—New Technologies for the Information Society, Faculty of Applied Sciences,
University of West Bohemia, Technická 8, 306 14 Plzeň, Czech Republic
nlp.kiv.zcu.cz
{tigi,krejzl,hourova,steinberger,llenc}@kiv.zcu.cz
Abstract: This paper describes our system created to de- 2 Related Work
tect stance in online discussions. The goal is to identify
whether the author of a comment is in favor of the given The SemEval-2016 task Detecting Stance in Tweets1 [10]
target or against. We created an extended corpus of Czech had two subtasks: supervised and weakly supervised
news comments and evaluated a support vector machines stance identification.
classifier, a maximum entropy classifier, and a convolu- The goal of both subtasks was to classify tweets into
tional neural network. three classes (In favor, Against, and Neither). The per-
formance was measured by the macro-averaged F1-score
Keywords: Stance Detection, Opinion Mining, Sentiment
of two classes (In favor and Against). This evaluation
Analysis
measure does not disregard the Neither class, because
falsely labelling the Neither class as In favor or Against
still affects the scores. We use the same evaluation metric
1 Introduction (F1_2), accuracy, and the F1-score of all classes (F1_3).
The supervised task (subtask A) tested stance towards
Stance detection has been defined as automatically deter- five targets: Atheism, Climate Change is a Real Concern,
mining from text whether the author is in favor of the given Feminist Movement, Hillary Clinton, and Legalization of
target entity (person, movement, topic, proposition, etc.), Abortion. Participants were provided with 2814 labeled
against it, or whether neither inference is likely. training tweets for the five targets.
A detailed distribution of stances for each target is given
Stance detection can be viewed as a subtask of opinion
in Table 1. The distribution is not uniform and there is
mining, similar to sentiment analysis. In sentiment analy-
always a preference towards a certain stance (e.g., 63%
sis, systems determine whether a piece of text is positive,
tweets about Atheism are labeled as Against). The distribu-
negative, or neutral. However, in stance detection, systems
tion reflects the real-world scenario, in which a majority of
predict author’s favorability towards a given target, which
people tend to take a similar stance. It also depends on the
may not even be explicitly mentioned in the text. More-
source of the data. For example, in the case of Legaliza-
over, the text may express positive opinion about an entity
tion of Abortion, we can assume that the distribution will
contained in the text, but one can also infer that the au-
be significantly different in religious communities than in
thor is against the defined target (an entity or a topic). It
atheistic communities.
has been found difficult to infer stance towards a target of
For the weakly supervised task (subtask B), there were
interest from tweets that express opinion towards another
no labeled training data but participants could use a large
entity[10].
number of tweets related to the single target: Donald
There are many applications which could benefit from
Trump.
the automatic stance detection, including information re-
The best results for subtask A were achieved by an ad-
trieval, textual entailment, or text summarization, in par-
vanced baseline using SVM classifier with unigrams, bi-
ticular opinion summarization.
grams, and trigrams along with character n-grams (2, 3, 4,
We created an extended corpus for stance detection for and 5-gram) as features.
Czech and evaluate standard top-performing models on Wei et al. [12] present the best result for subtask B and
this dataset and report the results. close second team in subtask A of the SemEval stance de-
The rest of this paper is organized as follows. We sum- tection task. They used a convolutional neural network
marise the releated work in Section 2). The creation of the (CNN) designed according to Kim [4]. It utilizes the same
used corpus is covered by Section 3. Our approach is de- kernel widths and numbers of filters as proposed by Kim.
scribed in Section 4. The convolutional neural network ar- Pre-trained word2vec embeddings are used for initializa-
chitecture is depicted in Section 5. Evaluation and results tion of the embedding layer. The main difference from
discussion is in Section 6 and future work is proposed in
Section 7. 1 http://alt.qcri.org/semeval2016/task6/
Detecting Stance in Czech News Commentaries 177
Table 1: Statistics of the SemEval-2016 task corpora in terms of the number of tweets and stance labels.
Target Entity Total In favor Against Neither
Atheism 733 124 (17%) 464 (63%) 145 (20%)
Climate Change is Concern 564 335 (59%) 26 (5%) 203 (36%)
Feminist Movement 949 268 (28%) 511 (54%) 170 (18%)
Hillary Clinton 934 157 (17%) 533 (57%) 244 (26%)
Legalization of Abortion 883 151 (17%) 523 (59%) 209 (24%)
All 4,063 1,035 (25%) 2,057 (51%) 971 (24%)
Table 2: Statistics of the Czech corpora in terms of the number of news comments and stance labels.
Target Entity Total In favor Against Neither
“Miloš Zeman” – Czech president 2,638 691 (26%) 1,263 (48%) 684 (26%)
“Smoking Ban in Restaurants” – Gold 1,388 272 (20%) 485 (35%) 631 (45%)
“Smoking Ban in Restaurants” – All 2,785 744 (27%) 1,280 (46%) 761 (27%)
Kim’s network is the used voting scheme. During each agreement (Cohen’s κ) was calculated between two anno-
training epoch, several iterations are selected to predict tators on 2,203 comments. The final κ is 0.579 for “Miloš
the test set. At the end of each epoch, the majority voting Zeman” (2,638 comments) and 0.423 for “Smoking Ban
scheme is applied to determine the label for each sentence. in Restaurants” (2,785 comments).
This is done over a specified number of epochs and finally The inter-annotator agreement for the target “Smoking
the same voting is applied to the results of each epoch. The Ban in Restaurants” was quite low, thus we selected a sub-
train and test data are separated according to the stance tar- set of the “Smoking Ban in Restaurants” part of dataset,
gets. where the original two annotators assigned the same label
The initial research on Czech data has been done as the gold dataset (1,388 comments).
in [7]. They collected 1,460 comments from a Czech news The corpus is available for research purposes at http:
server2 related to two topics – Czech president – “Miloš //nlp.kiv.zcu.cz/research/sentiment#stance.
Zeman” (181 In favor, 165 Against, and 301 Neither) and
“Smoking Ban in Restaurants” (168 In favor, 252 Against,
and 393 Neither). 4 The Approach Overview
The results with maximum entropy classifier were
“Miloš Zeman” F1_23 = 0.435, F1_34 = 0.52 and “Smok- We evaluate common supervised classifiers, namely max-
ing Ban in Restaurants” F1_23 = 0.456, F1_34 = 0.54. imum entropy classifier and support vector machines
(SVM) classifiers from Brainy[6]. We also experimented
with top-performing models for sentiment analysis and
3 Dataset stance detection in particular convolutional neural net-
work. The models were trained separately for each target
We extended the dataset from [7], nearly quadrupling its entity.
size. The detailed annotation procedure was described in
master thesis [3] in Czech. The whole corpus was anno- 4.1 Preprocessing
tated by three native speakers. The distribution of stances
for each target is given in Table 2. The same preprocessing has been done for all datasets. We
The target entity “Miloš Zeman” part of the dataset use UDPipe [11] with Czech Universal Dependencies 1.2
was annotated by one annotator and then 302 comments models for tokenization, POS tagging and lemmatization.
were also labeled by a second annotator to measure inter- Stemming has been done by the HPS stemmer [2]. Prelim-
annotator agreement. The target entity “Smoking Ban in inary experiments have shown that lower-casing the data
Restaurants” part of the dataset was independently anno- achieves slightly better results, thus all the experiments are
tated by two annotators. To resolve conflicts a third anno- performed with lower-cased data.
tator was used and then the majority voting scheme was
applied to the gold label selection. The inter-annotator
4.2 Features
2 www.idnes.cz We selected features commonly used in similar natural
3 F1 – (In favor/Against) language processing tasks e.g. sentiment analysis. The
4 F1 – (In favor/Against/Neither) following baseline features were used:
178 T. Hercig, P. Krejzl, B. Hourová, J. Steinberger, L. Lenc
5 Convolutional Neural Network
The architecture of the proposed CNN is depicted in Fig-
ure 1. We use similar architecture to the one proposed
in [8]. The input layer of the network receives a sequence
of word indices from a dictionary. The input vector must
be of a fixed length. We solve this issue by padding
the input sequence to the maximum text length occurring
in the train data denoted M. A special “PADDING” to-
ken is used for this purpose. The embedding layer maps
the word indices to the real-valued embedding vectors of
length L. The convolutional layer consists of NC kernels
containing k × 1 units and uses rectified linear unit (ReLU)
activation function. The convolutional layer is followed
by a max-pooling layer and dropout for regularization.
The max-pooling layer takes maxima from patches of size
(M −k +1)×1. The output of the max-pooling layer is fed
into a fully-connected layer. Follows the output layer with
3 neurons which corresponds to the number of classes. It
has softmax activation function.
In our experimental setup we use the embedding dimen-
sionality L = 300 and NC = 40 convolutional kernels with
5 × 1 units. The penultimate fully-connected layer con-
tains 256 neurons. We train the network using adaptive
moment estimation optimization algorithm [5] and cross-
entropy is used as the loss function.
Figure 1: Neural network architecture. 6 Results
Character n-gram – Separate binary feature for each We used 20-fold cross-validation for models evaluation to
character n-gram in the text. We do it separately for compensate the small size of dataset and to prevent over-
different orders n ∈ {3, 5, 7}.5 fitting.
For all experiments we report the macro-averaged F1-
Bag of words – Word occurrences in the text. score of two classes F1_2 (In favor and Against) – the
official metric for the SemEval-2016 stance detection
Bag of adverbs – Bag of adverbs from the text.
task[10], accuracy, and the macro-averaged F1-score of all
Bag of adjectives – Bag of adjectives from the text. three classes (F1_3).
Table 3 shows results for each dataset. CNN-1 is de-
Negative emoticons – We used a list of negative emoti- scribed in Section 5 and CNN-2 is the architecture pro-
cons6 specific to the news commentaries source. The posed in [4]. We achieved the best results on average with
feature captures the presence of an emoticon within the maximum entropy classifier with the feature set con-
the text. sisting of lemma unigrams, word shape, bag of adjectives,
bag of adverbs, and character n-grams (n ∈ {3, 5, 7}). We
Word shape – We assign words into one of 24 classes7
further performed ablation study of this combination of
similar to the function specified in [1].
features. In Table 3 the bold numbers denote five best re-
We experimented with additional features such as n- sults for given column and in the ablation study they de-
grams, text length, etc. but using these features did not note features with no gain in the given column (i.e. feature
lead to better results. Bag of words, adjectives and adverbs sets with no loss).
use the word lemma or stem. We report results for various Both CNNs achieved good results, CNN-2 was slightly
feature combinations and perform an ablation study of the better, this is not surprising as it was designed for senti-
best feature set. ment analysis while CNN-1 was previously used for docu-
ment classification. Surprisingly stem worked better than
5 Note that words e.g. emoticon “:-)” would be separated by spaces
lemma as the word input for both neural networks. The ab-
during tokenization resulting in “: - )”.
6 ":-(", ";-(", ":-/", "8-o", ";-e", ";-O", "Rv" lation study shows that word shape, bag of adjectives, and
7 We use edu.stanford.nlp.process.WordShapeClassifier [9] with the bag of adverbs features present little to no information gain
WORDSHAPECHRIS1 setting. for the classifier, thus these features should be discarded or
Detecting Stance in Czech News Commentaries 179
Table 3: Results on Czech stance detection datasets in %. We report accuracy (Acc), the macro-averaged F1-score of
two classes (F1_2) and the macro-averaged F1-score of all three classes (F1_3). Feature set consists of lemma unigrams,
word shape, bag of adjectives, bag of adverbs, and character n-grams (n ∈ {3, 5, 7}). The bold numbers denote five best
results for given column and in the ablation study they denote features with no gain in the given column (i.e. feature sets
with no loss).
Zeman Smoking All Smoking Gold
Classifier Features
F1_3 F1_2 Acc F1_3 F1_2 Acc F1_3 F1_2 Acc
SVM Random Class 32.7 34.6 33.4 32.4 34.4 33.0 31.2 27.2 32.2
SVM Majority Class 21.6 32.4 47.9 21.0 31.5 46.0 20.8 0.0 45.5
CNN-1 lemma 48.6 52.1 51.9 51.4 54.2 54.2 61.2 55.6 65.1
CNN-1 stemm 50.7 55.3 54.5 51.7 54.6 54.5 60.6 54.8 64.8
CNN-2 lemma 48.3 51.7 51.3 51.8 54.9 54.5 61.2 55.9 64.8
CNN-2 stemm 51.3 55.7 54.9 52.1 54.9 54.6 61.7 56.4 65.5
MaxEnt lemma 47.7 51.8 50.2 48.8 52.3 50.9 58.1 52.2 61.6
SVM lemma 46.7 52.0 50.7 50.4 55.3 53.8 60.1 54.5 63.5
MaxEnt stem 47.2 50.9 49.5 49.5 52.5 51.8 58.3 52.2 62.2
SVM stem 48.3 52.8 51.8 51.5 55.3 54.2 57.3 52.4 60.6
MaxEnt char. n-gram 3,5,7 50.4 55.7 53.7 50.3 54.9 53.1 61.6 56.8 65.0
SVM char. n-gram 3,5,7 47.4 53.4 52.2 51.3 57.2 54.9 57.6 53.2 60.8
MaxEnt shape 45.0 50.2 48.4 45.7 50.2 47.9 53.9 48.6 57.0
SVM shape 45.5 50.3 49.7 48.1 52.0 50.6 56.5 50.8 60.7
MaxEnt feature set 50.6 56.0 53.9 51.9 55.8 54.7 62.6 57.5 66.5
SVM feature set 47.9 54.3 52.7 52.6 58.2 56.0 59.8 55.3 62.9
MaxEnt feature set + emoticons 50.5 56.0 53.9 51.6 55.7 54.2 62.7 57.7 66.4
SVM feature set + emoticons 47.3 53.3 51.9 52.3 58.1 55.6 61.0 56.8 63.5
MaxEnt feature set + emoticons + stem 50.7 56.0 53.9 51.9 55.5 54.5 62.6 57.6 66.3
SVM feature set + emoticons + stem 47.7 53.5 52.2 51.6 57.4 55.0 60.6 55.6 64.1
MaxEnt feature set - shape 50.8 56.0 54.0 51.6 56.1 54.4 63.0 58.3 66.5
MaxEnt feature set - bag of adj. 50.7 56.1 54.0 51.8 55.4 54.4 62.7 57.7 66.4
MaxEnt feature set - bag of adv. 50.9 56.4 54.3 51.8 55.4 54.6 62.6 57.4 66.5
MaxEnt feature set - lemma 50.2 55.6 53.6 50.8 55.1 53.6 62.4 57.3 66.2
MaxEnt feature set - char. n-gram 3,5,7 46.9 51.7 49.9 48.6 52.3 50.9 58.1 52.3 61.8
readjusted to better capture the stance in comments. How- used for sentiment analysis and stance detection. We con-
ever, the selected feature combination still performed rea- ducted feature ablation and concluded that more features
sonably well. still need to be readjusted for this task.
The best results for the target “Miloš Zeman” were The used features are very common in natural language
achieved by CNN-2 in terms of accuracy and F1_3, processing, however even in the SemEval-2016 stance de-
F1_2 was the highest for maximum entropy classifier with tection task, the best results were achieved by commonly
lemma unigrams, word shape, bag of adjectives, and char- used features. This suggests that stance detection is still in
acter n-grams. The entity “Smoking Ban in Restaurants” its infancy and more gain can be expected in the future as
was best assessed by SVM with the selected feature set for researchers better understand this new task.
all data and by maximum entropy classifier with the same In future work, we plan to extend the dataset to other do-
feature set for the gold dataset. mains, include more target entities and comments, which
Character n-grams alone present a strong baseline for will let us draw stronger conclusions and move the task
this task. closer to the industrial expectations. Given that there are
vast amounts of news comments related to highly dis-
7 Conclusion cussed topics, we will study stance summarization which
should aim at identifying the most important arguments.
The paper describes our system created to detect stance in Another interesting experiment would be supplementing
online discussions. We evaluated top-performing models the dataset with sentiment annotation.
180 T. Hercig, P. Krejzl, B. Hourová, J. Steinberger, L. Lenc
Acknowledgments [9] Christopher D. Manning, Mihai Surdeanu, John
Bauer, Jenny Finkel, Steven J. Bethard, and David
This publication was supported by the project LO1506 of McClosky. The Stanford CoreNLP natural lan-
the Czech Ministry of Education, Youth and Sports un- guage processing toolkit. In Association for Compu-
der the program NPU I., by Grant No. SGS-2016-018 tational Linguistics (ACL) System Demonstrations,
Data and Software Engineering for Advanced Applica- pages 55–60, 2014. URL http://www.aclweb.
tions and by project MediaGist, EU’s FP7 People Pro- org/anthology/P/P14/P14-5010.
gramme (Marie Curie Actions), no. 630786.
[10] Saif Mohammad, Svetlana Kiritchenko, Parinaz Sob-
hani, Xiaodan Zhu, and Colin Cherry. Semeval-
References 2016 task 6: Detecting stance in tweets. In Proceed-
ings of the 10th International Workshop on Seman-
[1] Daniel M. Bikel, Scott Miller, Richard Schwartz, and tic Evaluation (SemEval-2016), pages 31–41, San
Ralph Weischedel. Nymble: a high-performance Diego, California, June 2016. Association for Com-
learning name-finder. In Proceedings of the fifth putational Linguistics. URL http://www.aclweb.
conference on Applied natural language processing, org/anthology/S16-1003.
pages 194–201. Association for Computational Lin-
guistics, 1997. [11] Milan Straka, Jan Hajič, and Jana Straková. UD-
Pipe: trainable pipeline for processing CoNLL-U
[2] Tomáš Brychcín and Miloslav Konopík. Hps: High files performing tokenization, morphological anal-
precision stemmer. Information Processing & Man- ysis, pos tagging and parsing. In Proceedings of
agement, 51(1):68–91, 2015. the Tenth International Conference on Language Re-
sources and Evaluation (LREC’16), Paris, France,
[3] Barbora Hourová. Automatic detection of argumen-
May 2016. European Language Resources Associa-
tation. Master’s thesis, University of West Bohemia,
tion (ELRA). ISBN 978-2-9517408-9-1.
Faculty of Applied Sciences, 2017.
[12] Wan Wei, Xiao Zhang, Xuqin Liu, Wei Chen, and
[4] Yoon Kim. Convolutional neural networks for
Tengjiao Wang. pkudblab at semeval-2016 task
sentence classification. In Proceedings of the
6 : A specific convolutional neural network sys-
2014 Conference on Empirical Methods in Natural
tem for effective stance detection. In Proceed-
Language Processing (EMNLP), pages 1746–1751,
ings of the 10th International Workshop on Seman-
Doha, Qatar, October 2014. Association for Com-
tic Evaluation (SemEval-2016), pages 384–388, San
putational Linguistics. URL http://www.aclweb.
Diego, California, June 2016. Association for Com-
org/anthology/D14-1181.
putational Linguistics. URL http://www.aclweb.
[5] Diederik Kingma and Jimmy Ba. Adam: A org/anthology/S16-1062.
method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
[6] Michal Konkol. Brainy: A machine learning li-
brary. In Leszek Rutkowski, Marcin Korytkowski,
Rafal Scherer, Ryszard Tadeusiewicz, Lotfi Zadeh,
and Jacek Zurada, editors, Artificial Intelligence and
Soft Computing, volume 8468 of Lecture Notes in
Computer Science, pages 490–499. Springer Interna-
tional Publishing, 2014. ISBN 978-3-319-07175-6.
[7] Peter Krejzl, Barbora Hourová, and Josef Stein-
berger. Stance detection in online discussions. In
Mária Bieliková and Ivan Srba, editors, WIKT & DaZ
2016 11th Workshop on Intelligent and Knowledge
Oriented Technologies 35th Conference on Data and
Knowledge, pages 211–214. Vydatel’stvo STU, Va-
zovova 5, Bratislava, Slovakia, November 2016.
ISBN 978-80-227-4619-9.
[8] Ladislav Lenc and Pavel Král. Deep neural networks
for czech multi-label document classification. CoRR,
abs/1701.03849, 2017. URL http://arxiv.org/
abs/1701.03849.