Same Side Stance Classification Benno Stein Yamen Ajjour Roxanne El Baff Khalid Al-Khatib Bauhaus-Universität Weimar Faculty of Media, Webis Group .@uni-weimar.de Philipp Cimiano Henning Wachsmuth Bielefeld University Paderborn University AG Semantic Computing Department of Computer Science cimiano@cit-ec.uni-bielefeld.de henningw@upb.de Abstract referred to as Πsameside ). Same side stance classi- fication deals with the problem of classifying two This paper introduces the Same Side Stance Classification problem and reports on the out- arguments as to whether they (a) share the same come of a related shared task, which has been stance or (b) have a different stance towards the collocated with the Sixth Workshop on Argu- topic in question. ment Mining at the ACL 2019 in Florence.1 As an example, consider the following two ar- We have proposed this task as a variant of the guments on the topic “gay marriage”, which obvi- well-known stance classification task: Instead ously are on the same side. of predicting for a single argument whether it has a positive or negative stance towards a Argument 1. Marriage is a commitment to given topic, same side classification ‘merely’ love and care for your spouse till death. This involves the prediction of whether two given is what is heard in all wedding vows. Gays can arguments share the same stance. The paper clearly qualify for marriage according to these in hand provides the rationale for proposing this task, overviews important related work, vows, and any definition of marriage deduced describes the developed datasets, and reports from these vows. on the results along with the main methods of the nine submitted systems. We draw con- Argument 2. Gay Marriage should be legal- clusions from these results with respect to the ized since denying some people the option to suitability of the task as a proxy for measuring marry is discriminatory and creates a second progress in the field of argument mining. class of citizens. Argument 3 below, however, is neither on the 1 Introduction side of Argument 1 nor on the side of Argument 2. Identifying (i.e., classifying) the stance of an argu- Argument 3. Marriage is the institution that ment towards a particular topic is a fundamental forms and upholds for society, its values and task in computational argumentation and argument symbols are related to procreation. To change mining. The stance of an argument as considered the definition of marriage to include same-sex here is a two-valued function: it can either be “pro” couples would destroy its function, because it a topic (meaning, “yes, I agree”), or “con” the topic could no longer represent the inherently procre- (“no, I do not agree”). ative relationship of opposite-sex pair-bonding. Here we propose a related though simpler task, called same side stance classification (later also Same side stance classification is simpler than 1 the “classical” stance classification problem, or at https://sameside.webis.de/ Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). most equally complex: solving the latter implies 2 Argument Decision Problems solving the former as well. The same side stance classification task, Πsameside , Aside from the difference in problem complexity is a decision task in the field of computational argu- a second aspect renders same side stance classifi- mentation. As outlined in Section 1, mastering this cation a relevant task of its own right: Stance clas- task is beneficial in the context of argumentation sification, by definition, requires knowledge about analytics and information retrieval. This section the topic that an argument is meant to address, i.e., provides a succinct formalization of the problem. stance classifiers must be trained for a particular topic and hence cannot be reliably applied to other The syntax of the argument model underlying (i.e, across) topics. In contrast, a same side stance Πsameside is rather simple but well-accepted: An classifier does not necessarily need to distinguish argument consists of a conclusion, c, and a set (a between topic-specific pro- and con-vocabulary; conjunction) of premises, P . “merely” the argument similarity within a stance Both premises and conclusions are considered needs to be assessed. Consequently, same side as propositions to which a truth value can be as- stance classification is likely to be solvable inde- signed. For this purpose an interpretation function, pendently of a topic or a domain—so to speak, in I, which maps from premises and conclusion to a topic-agnostic fashion. Since topic agnosticity {0, 1} can be stated. Based on I the premises is a big step towards application robustness and P and the conclusion c can be connected semanti- flexibility, we believe that the development of tech- cally. Recall in this regard the classical notion of nologies that tackle this task has game-changing entailment, which bases the concept of logical con- potential. sequence on all possible interpretation functions: Last but not least, same side stance classification Given two propositional formulas α, β, then α en- has a number of useful and important applications tails β (denoted as α |= β) if and only if for all I related to both argumentation analytics and infor- holds: mation retrieval, including but not limited to the I(α) = 1 implies I(β) = 1 (1) following: However, for our argument model (and for ar- • Measuring the strength of bias within an argu- gumentation in natural language in general) this mentative utterance (analytics). notion of entailment is not applicable: human lan- • Structuring a discussion (analytics). guage cannot be stuffed entirely into logical for- mulas; the detection of semantically equivalent ar- • Finding out who or what is challenging in a gument units (which is necessary to transform for- discussion (analytics, retrieval). mulas whose atoms correspond to argument units) • Filtering wrongly-labeled arguments in a large belongs to the hardest NLP problems; truth entail- argument corpus, without relying on knowl- ment in natural language is not restricted to a recur- edge of a topic or a domain (retrieval). sive evaluation of truth values but comes in many different flavors such as argument from authority, To initiate research on same side stance classi- analogical argument, or inductive argument; and fication, we carried out a first respective shared so forth. task in collocation with the Sixth Workshop on Ar- In any way, argumentation theory speaks of gument Mining at ACL 2019. We report on this acceptability rather than truth, since truth is of- shared task and its results in the paper in hand. ten unknown or not accessible (Wachsmuth et al., The remainder is organized as follows. Section 2 2017a). The acceptability of an argument is sub- formalizes the same side stance classification task jective, which we capture as follows. Given an in- and relates it to other problems in the field. Sec- terpretation function I, propositional premises P , tion 3 points to relevant research and suggested and a propositional conclusion c, then (c, P ) is an readings related to stance classification. Section 4 acceptable argument if and only if holds: describes the dataset and the experiment settings of the shared task. Section 5 reports on the systems of I(∧p∈P ) = 1 and I(c) = 1 (2) the nine participating teams and their effectiveness. Compared to the classical notion of entailment Section 6 concludes with the lessons learned and the universality requirement regarding interpreta- the planned follow-up resarch. tion functions is relaxed. In this vein, (c, P ) may be an argument for an individual, for a group, or 3 Related Work for all beholders—depending on the respective I. We have first mentioned same side stance classifi- Also, due to the aforementioned reasons, there is cation as a potential task in the context of argument no simple structural means2 that connects the in- search (Ajjour et al., 2019). Some related previous terpretation of c to the interpretation of P : For research has been concerned with the agreement participants in a debate the interpretation of the of different texts on a given topic (Menini et al., premises may be identical, but their mental models 2017). In computational argumentation, the task to determine the truth value of c, as well as the truth is new to our knowledge, which is why we restrict value itself, can differ. our view to the most related task in the following: The formalization of argument acceptability via stance classification. interpretation functions as introduced above illus- Stance classification has drawn a wide interest trates how a belief semantics for arguments can be in the last decade. The problem has been studied formalized. However, the identification and classi- for various linguistic genres including online de- fication of argument stance (as treated here as well bates (Somasundaran and Wiebe, 2009; Hasan and as treated by other researchers) does not depend on Ng, 2013; Ranade et al., 2013), political debates individual interpretation functions. Arguments are (Vilares and He, 2017), tweets (Addawood et al., formulated purposefully with respect to a thesis, 2017; Mohammad et al., 2017), and spontaneous which means that they are always dedicated to be speech (Levow et al., 2014). Stance classification used either as pro or as con argument—independent approaches have been motivated by different goals, of the acceptability of a beholder. such as fact checking (Bourgonje et al., 2017; Baly To formalize the interesting argument decision et al., 2018; Nadeem et al., 2019), enthymeme re- problems will consider a propositional thesis t, also construction (Rajendran et al., 2016), and knowl- called the “main claim”, which encodes a particular edge graph building (Toledo-Ronen et al., 2016). “side” of a controversial issue. E.g., when referring The underlying methods concentrate on supervised to the introductory example, t may encode “Gay learning. Among these, Bar-Haim et al. (2017) marriage is a great achievement.”, but t may also employ a support vector machine with multiple encode “Gay marriage cannot be tolerated.”.3 linguistic features, similar to those used in senti- Let A = {(c1 , P1 ), (c2 , P2 ), . . . , (cn , Pn )} be a ment analysis. Iyyer et al. (2014) apply recursive set of arguments related to t, then we are also given neural networks, Augenstein et al. (2016) use a an (implicitly defined) function σ, called “stance”, bidirectional LSTM, and Chen et al. (2018) im- which maps each argument A ∈ A either to pro plement a hybrid neural attention model. Unlike or to con: σ encodes for which side of a contro- stance classification, the task we consider here does versial issue an argument is devised. A pro argu- widely abstract from the topic on which stance is ment supports t; likewise, a con argument attacks t. expressed. Two arguments A1 and A2 have the same stance iff σ(A1 ) = σ(A2 ). 4 Dataset and Experiments Using these definitions, among others the follow- In the shared task we carried out, we have devised ing decision problems can be stated. Given are a two types of same side stance classification experi- thesis t and a set of related arguments A. ments: within a single topic and across two topics. The latter experiment type models the situation of • Πsameside . Decide for two arguments, A1 , A2 a domain transfer and addresses the question of in A whether or not they have the same stance. topic-agnostic classification. As topics we chose • Πstance . Decide for an argument A in A “gay marriage” and “abortion”, and we sampled whether it has a pro or a con stance, i.e., the respective argument datasets from the corpus whether σ(A) = pro or σ(A) = con. underlying the argument search engine args.me (Wachsmuth et al., 2017b). The following subsec- Algorithmic stance classification as treated here tions provide details about the dataset construction means to learn the function σ from a set of exam- and the experiment setup. ples. 2 Except for the trivial case where c ∈ P . 3 Given a thesis t we can consider its opposite as antithesis. Training Test Class Gay Abortion Σ Gay Abortion Σ Sameside 13 277 20 834 34 111 63 63 126 Diffside 9 786 20 006 29 792 63 63 126 Σ 23 063 40 840 63 903 126 126 252 Table 1: Number of argument pairs in the training sets and test sets of the within-topic experiments. 4.1 Dataset Class Training: Abortion Test: Gay Because of its size and the balanced stance distri- Sameside 31 195 3 028 bution, the args.me corpus provides a rich source Diffside 29 853 3 028 for our experiments. At the time of the shared task Σ 61 048 6 056 the corpus consisted of 387 606 arguments that col- lected from 59 637 debates; a detailed description Table 2: Number of argument pairs in the training and can be found in (Ajjour et al., 2019).4 test set of the cross-topics experiment. An argument in args.me is modeled as a conclu- sion along with a set of supporting premises. In marriage” independently of each other. The train- addition, each premise is labeled with a stance, in- ing sets each contain 67% of the argument pairs of dicating whether it is “pro” or “con” the conclusion. one topic, which were randomly chosen. The test The stances originate from the debates where the sets were formed from the remaining 33% for the arguments are used in. Debates can be started from respective topic. Among others, it was ensured that different viewpoints, for instance, a debate may a label for an argument pair in the test set cannot be discuss the viewpoint “abortion should be legal- transitively deduced.5 Note in this regard that the ized” while another may discuss “abortion should “same side” relation forms an equivalence relation. be banned”). Therefore, the stance of an argument See Table 1 for the within-topic dataset statistics. has to interpreted in relation to the arguments in the same debate. During the acquisition process of the Cross-Topics Experiment The cross-topics ex- data for the shared task we followed this constraint periment provides a different topic for training by ensuring that the arguments of an argument pair from the one for testing. In particular, the train- stem always from the same debate. ing set contains argument pairs from the “abortion” The count of debates that treat “abortion” and debates only, while the test set contains argument “gay marriage” is 1567 and 712 respectively. We pairs from “gay marriage” debates only. “Same- filtered out those arguments whose premises are side” pairs and “Diffside” pairs are balanced. See shorter than four words since they are often meta Table 2 for the Cross-Topics dataset statistics. statements such as “I win” or “I accept”. As a result, we kept 9426 arguments on abortion and 5 Submitted Systems and Results 4480 arguments on gay marriage for the task. Overall, nine teams participated in the first shared 4.2 Experiments task on same side stance classification. This section provides a brief overview of the systems that the Starting from the arguments in a debate, we gener- teams submitted, along with their results. ated all possible argument pairs. An argument pair was labeled as “Sameside” if both arguments are Düsseldorf University The system submitted by either “pro” or “con” the viewpoint of the debate, Düsseldorf University relies on a Siamese network otherwise the pair is labeled as “Diffside”. Pairs trained to predict the similarity of two arguments with identical arguments were removed. on top of a small BERT (Devlin et al., 2018). As the maximum token length for BERT is 512 tokens, Within-Topic Experiments The within-topic ex- a relevance selection component to rank sentences periments treat the two topics “Abortion” and “Gay by relevance is integrated, cutting the ranked input 4 The entire args.me corpus can be accessed here: https: 5 //webis.de/data.html#args-me With transitive deduction we mean: SameSide(A1 , A2 ) ∧ SameSide(A3 , A2 ) ` SameSide(A1 , A3 ) Within-Topic Cross-Topics Gay Abortion All Team Pre Rec Acc Pre Rec Acc Pre Rec Acc Pre Rec Acc † Trier University 0.90 0.73 0.83 0.79 0.59 0.71 0.85 0.66 0.77 0.73 0.72 0.73 Leipzig University 0.80 0.78 0.79 0.78 0.68 0.75 0.79 0.73 0.77 0.72 0.72 0.72 IBM Research 0.73 0.63 0.70 0.64 0.54 0.62 0.69 0.59 0.66 0.62 0.49 0.60 TU Darmstadt 0.74 0.56 0.68 0.63 0.48 0.60 0.68 0.52 0.64 0.64 0.59 0.63 Düsseldorf University 0.76 0.35 0.62 0.65 0.32 0.57 0.70 0.33 0.60 0.72 0.53 0.66 Trier University† 0.64 0.25 0.64 0.67 0.22 0.56 0.65 0.24 0.56 0.70 0.11 0.53 LMU 0.53 1.00 0.55 0.53 1.00 0.55 0.53 1.00 0.55 0.67 0.53 0.63 MLU Halle‡ 0.54 0.57 0.54 0.53 0.57 0.53 0.53 0.57 0.54 0.50 0.57 0.50 Paderborn University 0.55 0.17 0.52 0.62 0.21 0.54 0.59 0.19 0.53 0.60 0.38 0.56 University of Potsdam 0.46 0.54 0.45 0.56 0.62 0.56 0.51 0.58 0.51 0.51 0.52 0.51 MLU Halle‡ 0.47 0.11 0.49 0.54 0.11 0.51 0.50 0.11 0.50 0.46 0.00 0.50 Table 3: The results of the submissions for the within-topic experiments and the cross-topics experiment in terms of precision (Pre), recall (Rec), and accuracy (Acc). For both Trier University† and MLU Halle‡ , the best and the worst result are reported since they submitted multiple systems. at 512 tokens. The system achieved an accuracy and edges are labeled with the confidence that the of 60% on the within-topic task and 66% across associated arguments agree with each other. This topics. graph-based approach has the benefit that more training data can be generated by a transitive clo- IBM Research The system submitted by IBM sure. Its accuracy was 55% in the within topic is based on a small vanilla BERT model and has setting and 63% in the cross-topic setting. been first fine-tuned to perform standard binary pro/con stance classification on data extracted from MLU Halle The system submitted by the Martin- the IBM Debater project. On top of this model, Luther-University (MLU) of Halle-Wittenberg con- another model is initialized and fine-tuned on the sists of three system. The first system uses a tree- same side classification task. The system obtained based learning algorithm as classifier using stan- results inverse to the ones of Düsseldorf University: dard bag-of-words features. The second is a rule- 66% accuracy in the within-topic setting 60% in based approach that reduces the task to sentiment the cross-topics setting. classification relying on rules defined over lists of words with their polarity taken from a sentiment Leipzig University The system submitted by lexicon. The third is a re-implementation of the Leipzig University uses a pre-trained BERT model stance classification approach of Bar-Haim et al. that is fine-tuned on the same side stance classifica- (2017). The best system achieves an accuracy of tion task. In addition, a binary classification layer 54% on the within-topics setting and 50% on the with one output and cross entropy loss function cross-topics setting. is used instead of a multilabel classification layer. To embed an argument, the first 254 tokens of an Paderborn University The system submitted by argument are fed through the BERT model. Then, Paderborn University relies on a Siamese Neural the last 254 tokens of an argument are embedded. Network to map arguments to a new space where The concatenation of both embeddings is fed into arguments with the same stance are closer to each the classification layer. The system achieved an other, and other arguments are less close. Argu- accuracy of 77% in the within-topic setting and ments are represented by the contextual word em- 72% on the cross-topics setting. beddings provided by the Flair library (Akbik et al., 2018). A final sigmoid activation function produces LMU The system submitted by the Ludwig Max- the output used for same side stance classification. imilian University (LMU) relies on a vanilla pre- The system achieved an accuracy of 53% within trained BERT base model that is fine-tuned to the topics and 56% across topics. shared task. The data is organized in a graph with one graph per topic. Nodes represent arguments, Trier University The system submitted by Trier proaches, such as encoding the beginning and end University relies on a pre-trained BERT base model of the arguments separately and then concatenat- fine-tuned to the shared task. It was submitted ing these encodings or implementing a relevance with different configurations. The best yielded an ranking system to encode only the most relevant accuracy of 77% in the within-topics setting and sentences of the argument. In any case, the encod- 73% on the cross-topics setting, the worst 56% and ing strategy seems to have a clear impact on the 53% respectively. results and thus deserves further investigation. For related tasks, e.g. the ARCT, it has been TU Darmstadt The system submitted by the TU found recently that encoder-based models seem to Darmstadt relies on a multi-task deep network on pick up surface cues and artifacts of the dataset the basis of the pre-trained large BERT model. The and that they are not really able to learn a model network is trained on a number of pro/con stance that shows deeper understanding of how arguments classification datasets in addition to the shared task work. It is up to further investigation whether also dataset. The system achieved an accuracy of 64% the same side stance classification task bears the in the within-topics setting and 63% in the cross- potential for such artifacts that can be picked up topic setting. by system. It would be interesting to investigate University of Potsdam The system submitted by which task the encoder-based models actually learn the University of Potsdam relies on bidirectional to solve. LSTMs to encode the arguments. The embeddings of both arguments are concatenated, multiplied in an element-wise fashion, substracted, and fed into References a two-layer MLP as a classification layer. The Aseel Addawood, Jodi Schneider, and Masooda Bashir. system achieved 51% accuracy both within and 2017. Stance classification of Twitter debates: The encryption debate as a use case. In 8th International across topics. Conference on Social Media and Society, ACM In- ternational Conference Proceeding Series. Associa- 6 Discussion and Outlook tion for Computing Machinery. The results of the shared task license a number Yamen Ajjour, Henning Wachsmuth, Johannes Kiesel, of interesting conclusions. First of all, the results Martin Potthast, Matthias Hagen, and Benno Stein. have validated our hypothesis that a topic-agnostic 2019. Data Acquisition for Argument Search: The args.me corpus. In 42nd German Conference on Ar- approach to same side stance classification is fea- tificial Intelligence (KI 2019). Springer. sible. This is clearly conveyed by the fact that the within-topic and the cross-topics setting seem to Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence be of a similar complexity. Also, the differences in labeling. In Proceedings of the 27th International accuracy on both tasks are less than 5–6% points, Conference on Computational Linguistics, pages additionally corroborating the hypothesis. 1638–1649, Santa Fe, New Mexico, USA. Associ- A second conclusion is that the effectiveness ation for Computational Linguistics. of most systems clearly improves over a random Isabelle Augenstein, Tim Rocktäschel, Andreas Vla- baseline, showing that the task is generally feasible. chos, and Kalina Bontcheva. 2016. Stance Detection At the same time, however, the results show that with Bidirectional Conditional Encoding. In Pro- there is potential for improvement. ceedings of the 2016 Conference on Empirical Meth- ods in Natural Language Processing, pages 876– As for other tasks in the field of argumentation, 885. Association for Computational Linguistics. such as the Argument Reasoning Comprehension Task, ARCT (Habernal et al., 2018), encoder-based Ramy Baly, Mitra Mohtarami, James Glass, Lluís Màrquez, Alessandro Moschitti, and Preslav Nakov. models seem to reach top results. In fact, all of the 2018. Integrating Stance Detection and Fact Check- top-5 performing systems on our task (Trier Univer- ing in a Unified Corpus. In Proceedings of the sity, Leipzig University, IBM Research, TU Darm- 2018 Conference of the North American Chapter of stadt, and Düsseldorf University) rely on a BERT the Association for Computational Linguistics: Hu- man Language Technologies, Volume 2 (Short Pa- model. They differ mainly in the way the input pers), pages 21–27. Association for Computational is encoded. As the length of input arguments ex- Linguistics. ceeds the maximum input length for BERT models, the participants explored and proposed different ap- Roy Bar-Haim, Indrajit Bhattacharya, Francesco Din- uzzo, Amrita Saha, and Noam Slonim. 2017. Stance Classification of Context-Dependent Claims. In Pro- Saif M. Mohammad, Parinaz Sobhani, and Svetlana ceedings of the 15th Conference of the European Kiritchenko. 2017. Stance and Sentiment in Tweets. Chapter of the Association for Computational Lin- ACM Trans. Internet Technol., 17(3). guistics: Volume 1, Long Papers, pages 251–261. Association for Computational Linguistics. Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End- Peter Bourgonje, Julian Moreno Schneider, and Georg to-End Fact Checking System. In Proceedings of Rehm. 2017. From Clickbait to Fake News Detec- the 2019 Conference of the North American Chap- tion: An Approach based on Detecting the Stance ter of the Association for Computational Linguis- of Headlines to Articles. In Proceedings of the tics (Demonstrations), pages 78–83. Association for 2017 EMNLP Workshop: Natural Language Pro- Computational Linguistics. cessing meets Journalism, pages 84–89. Association for Computational Linguistics. Pavithra Rajendran, Danushka Bollegala, and Simon Parsons. 2016. Contextual stance classification of Di Chen, Jiachen Du, Lidong Bing, and Ruifeng opinions: A step towards enthymeme reconstruction Xu. 2018. Hybrid Neural Attention for Agree- in online reviews. In Proceedings of the Third Work- ment/Disagreement Inference in Online Debates. In shop on Argument Mining (ArgMining2016), pages Proceedings of the 2018 Conference on Empirical 31–39. Association for Computational Linguistics. Methods in Natural Language Processing, pages 665–670. Association for Computational Linguis- Sarvesh Ranade, Rajeev Sangal, and Radhika Mamidi. tics. 2013. Stance Classification in Online Debates by Recognizing Users’ Intentions. In Proceedings of Jacob Devlin, Ming-Wei Chang, Kenton Lee, and the SIGDIAL 2013 Conference, pages 61–69. Asso- Kristina Toutanova. 2018. Bert: Pre-training of deep ciation for Computational Linguistics. bidirectional transformers for language understand- ing. arXiv preprint arXiv:1810.04805. Swapna Somasundaran and Janyce Wiebe. 2009. Rec- ognizing Stances in Online Debates. In Proceed- Ivan Habernal, Henning Wachsmuth, Iryna Gurevych, ings of the Joint Conference of the 47th Annual and Benno Stein. 2018. SemEval-2018 task 12: The Meeting of the ACL and the 4th International Joint argument reasoning comprehension task. In Pro- Conference on Natural Language Processing of the ceedings of The 12th International Workshop on Se- AFNLP, pages 226–234. Association for Computa- mantic Evaluation, pages 763–772, New Orleans, tional Linguistics. Louisiana. Association for Computational Linguis- tics. Orith Toledo-Ronen, Roy Bar-Haim, and Noam Slonim. 2016. Expert Stance Graphs for Computa- Kazi Saidul Hasan and Vincent Ng. 2013. Stance tional Argumentation. In Proceedings of the Third Classification of Ideological Debates: Data, Mod- Workshop on Argument Mining (ArgMining2016), els, Features, and Constraints. In Proceedings of pages 119–123. Association for Computational Lin- the Sixth International Joint Conference on Natural guistics. Language Processing, pages 1348–1356. Asian Fed- eration of Natural Language Processing. David Vilares and Yulan He. 2017. Detecting Perspec- tives in Political Debates. In Proceedings of the Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and 2017 Conference on Empirical Methods in Natural Philip Resnik. 2014. Political Ideology Detection Language Processing, pages 1573–1582. Associa- Using Recursive Neural Networks. In Proceedings tion for Computational Linguistics. of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Henning Wachsmuth, Nona Naderi, Yufang Hou, pages 1113–1122. Association for Computational Yonatan Bilu, Vinodkumar Prabhakaran, Tim Al- Linguistics. berdingk Thijm, Graeme Hirst, and Benno Stein. 2017a. Computational argumentation quality assess- G. Levow, V. Freeman, A. Hrynkevich, M. Ostendorf, ment in natural language. In Proceedings of the 15th R. Wright, J. Chan, Y. Luan, and T. Tran. 2014. Conference of the European Chapter of the Associa- Recognition of stance strength and polarity in spon- tion for Computational Linguistics: Volume 1, Long taneous speech. In 2014 IEEE Spoken Language Papers, pages 176–187. Association for Computa- Technology Workshop (SLT), pages 236–241. tional Linguistics. Stefano Menini, Federico Nanni, Simone Paolo Henning Wachsmuth, Martin Potthast, Khalid Al- Ponzetto, and Sara Tonelli. 2017. Topic-based agree- Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu, ment and disagreement in us electoral manifestos. Jonas Dorsch, Viorel Morari, Janek Bevendorff, and In Proceedings of the 2017 Conference on Empiri- Benno Stein. 2017b. Building an argument search cal Methods in Natural Language Processing, pages engine for the web. In Proceedings of the 4th Work- 2938–2944. Association for Computational Linguis- shop on Argument Mining, pages 49–59. Associa- tics. tion for Computational Linguistics.