Same Side Stance Classification


             Benno Stein                     Yamen Ajjour      Roxanne El Baff Khalid Al-Khatib
                                                  Bauhaus-Universität Weimar
                                                 Faculty of Media, Webis Group
                                              <first>.<last>@uni-weimar.de


                  Philipp Cimiano           Henning Wachsmuth
                 Bielefeld University        Paderborn University
               AG Semantic Computing    Department of Computer Science
        cimiano@cit-ec.uni-bielefeld.de     henningw@upb.de


                               Abstract                                               referred to as Πsameside ). Same side stance classi-
                                                                                      fication deals with the problem of classifying two
        This paper introduces the Same Side Stance
        Classification problem and reports on the out-                                arguments as to whether they (a) share the same
        come of a related shared task, which has been                                 stance or (b) have a different stance towards the
        collocated with the Sixth Workshop on Argu-                                   topic in question.
        ment Mining at the ACL 2019 in Florence.1                                        As an example, consider the following two ar-
        We have proposed this task as a variant of the                                guments on the topic “gay marriage”, which obvi-
        well-known stance classification task: Instead                                ously are on the same side.
        of predicting for a single argument whether
        it has a positive or negative stance towards a                                  Argument 1. Marriage is a commitment to
        given topic, same side classification ‘merely’                                  love and care for your spouse till death. This
        involves the prediction of whether two given                                    is what is heard in all wedding vows. Gays can
        arguments share the same stance. The paper
                                                                                        clearly qualify for marriage according to these
        in hand provides the rationale for proposing
        this task, overviews important related work,                                    vows, and any definition of marriage deduced
        describes the developed datasets, and reports                                   from these vows.
        on the results along with the main methods
        of the nine submitted systems. We draw con-                                     Argument 2. Gay Marriage should be legal-
        clusions from these results with respect to the                                 ized since denying some people the option to
        suitability of the task as a proxy for measuring                                marry is discriminatory and creates a second
        progress in the field of argument mining.                                       class of citizens.

                                                                                         Argument 3 below, however, is neither on the
1       Introduction                                                                  side of Argument 1 nor on the side of Argument 2.

Identifying (i.e., classifying) the stance of an argu-                                  Argument 3. Marriage is the institution that
ment towards a particular topic is a fundamental                                        forms and upholds for society, its values and
task in computational argumentation and argument                                        symbols are related to procreation. To change
mining. The stance of an argument as considered                                         the definition of marriage to include same-sex
here is a two-valued function: it can either be “pro”                                   couples would destroy its function, because it
a topic (meaning, “yes, I agree”), or “con” the topic                                   could no longer represent the inherently procre-
(“no, I do not agree”).                                                                 ative relationship of opposite-sex pair-bonding.
   Here we propose a related though simpler task,
called same side stance classification (later also                                      Same side stance classification is simpler than
    1                                                                                 the “classical” stance classification problem, or at
        https://sameside.webis.de/


               Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
most equally complex: solving the latter implies         2   Argument Decision Problems
solving the former as well.
                                                         The same side stance classification task, Πsameside ,
   Aside from the difference in problem complexity
                                                         is a decision task in the field of computational argu-
a second aspect renders same side stance classifi-
                                                         mentation. As outlined in Section 1, mastering this
cation a relevant task of its own right: Stance clas-
                                                         task is beneficial in the context of argumentation
sification, by definition, requires knowledge about
                                                         analytics and information retrieval. This section
the topic that an argument is meant to address, i.e.,
                                                         provides a succinct formalization of the problem.
stance classifiers must be trained for a particular
topic and hence cannot be reliably applied to other         The syntax of the argument model underlying
(i.e, across) topics. In contrast, a same side stance    Πsameside is rather simple but well-accepted: An
classifier does not necessarily need to distinguish      argument consists of a conclusion, c, and a set (a
between topic-specific pro- and con-vocabulary;          conjunction) of premises, P .
“merely” the argument similarity within a stance            Both premises and conclusions are considered
needs to be assessed. Consequently, same side            as propositions to which a truth value can be as-
stance classification is likely to be solvable inde-     signed. For this purpose an interpretation function,
pendently of a topic or a domain—so to speak, in         I, which maps from premises and conclusion to
a topic-agnostic fashion. Since topic agnosticity        {0, 1} can be stated. Based on I the premises
is a big step towards application robustness and         P and the conclusion c can be connected semanti-
flexibility, we believe that the development of tech-    cally. Recall in this regard the classical notion of
nologies that tackle this task has game-changing         entailment, which bases the concept of logical con-
potential.                                               sequence on all possible interpretation functions:
   Last but not least, same side stance classification   Given two propositional formulas α, β, then α en-
has a number of useful and important applications        tails β (denoted as α |= β) if and only if for all I
related to both argumentation analytics and infor-       holds:
mation retrieval, including but not limited to the                  I(α) = 1 implies I(β) = 1             (1)
following:                                                  However, for our argument model (and for ar-
   • Measuring the strength of bias within an argu-      gumentation in natural language in general) this
     mentative utterance (analytics).                    notion of entailment is not applicable: human lan-
   • Structuring a discussion (analytics).               guage cannot be stuffed entirely into logical for-
                                                         mulas; the detection of semantically equivalent ar-
   • Finding out who or what is challenging in a
                                                         gument units (which is necessary to transform for-
     discussion (analytics, retrieval).
                                                         mulas whose atoms correspond to argument units)
   • Filtering wrongly-labeled arguments in a large      belongs to the hardest NLP problems; truth entail-
     argument corpus, without relying on knowl-          ment in natural language is not restricted to a recur-
     edge of a topic or a domain (retrieval).            sive evaluation of truth values but comes in many
                                                         different flavors such as argument from authority,
   To initiate research on same side stance classi-
                                                         analogical argument, or inductive argument; and
fication, we carried out a first respective shared
                                                         so forth.
task in collocation with the Sixth Workshop on Ar-
                                                            In any way, argumentation theory speaks of
gument Mining at ACL 2019. We report on this
                                                         acceptability rather than truth, since truth is of-
shared task and its results in the paper in hand.
                                                         ten unknown or not accessible (Wachsmuth et al.,
   The remainder is organized as follows. Section 2
                                                         2017a). The acceptability of an argument is sub-
formalizes the same side stance classification task
                                                         jective, which we capture as follows. Given an in-
and relates it to other problems in the field. Sec-
                                                         terpretation function I, propositional premises P ,
tion 3 points to relevant research and suggested
                                                         and a propositional conclusion c, then (c, P ) is an
readings related to stance classification. Section 4
                                                         acceptable argument if and only if holds:
describes the dataset and the experiment settings of
the shared task. Section 5 reports on the systems of                I(∧p∈P ) = 1 and I(c) = 1              (2)
the nine participating teams and their effectiveness.
                                                            Compared to the classical notion of entailment
Section 6 concludes with the lessons learned and
                                                         the universality requirement regarding interpreta-
the planned follow-up resarch.
                                                         tion functions is relaxed. In this vein, (c, P ) may
be an argument for an individual, for a group, or                     3   Related Work
for all beholders—depending on the respective I.
                                                                      We have first mentioned same side stance classifi-
Also, due to the aforementioned reasons, there is
                                                                      cation as a potential task in the context of argument
no simple structural means2 that connects the in-
                                                                      search (Ajjour et al., 2019). Some related previous
terpretation of c to the interpretation of P : For
                                                                      research has been concerned with the agreement
participants in a debate the interpretation of the
                                                                      of different texts on a given topic (Menini et al.,
premises may be identical, but their mental models
                                                                      2017). In computational argumentation, the task
to determine the truth value of c, as well as the truth
                                                                      is new to our knowledge, which is why we restrict
value itself, can differ.
                                                                      our view to the most related task in the following:
   The formalization of argument acceptability via                    stance classification.
interpretation functions as introduced above illus-                      Stance classification has drawn a wide interest
trates how a belief semantics for arguments can be                    in the last decade. The problem has been studied
formalized. However, the identification and classi-                   for various linguistic genres including online de-
fication of argument stance (as treated here as well                  bates (Somasundaran and Wiebe, 2009; Hasan and
as treated by other researchers) does not depend on                   Ng, 2013; Ranade et al., 2013), political debates
individual interpretation functions. Arguments are                    (Vilares and He, 2017), tweets (Addawood et al.,
formulated purposefully with respect to a thesis,                     2017; Mohammad et al., 2017), and spontaneous
which means that they are always dedicated to be                      speech (Levow et al., 2014). Stance classification
used either as pro or as con argument—independent                     approaches have been motivated by different goals,
of the acceptability of a beholder.                                   such as fact checking (Bourgonje et al., 2017; Baly
   To formalize the interesting argument decision                     et al., 2018; Nadeem et al., 2019), enthymeme re-
problems will consider a propositional thesis t, also                 construction (Rajendran et al., 2016), and knowl-
called the “main claim”, which encodes a particular                   edge graph building (Toledo-Ronen et al., 2016).
“side” of a controversial issue. E.g., when referring                 The underlying methods concentrate on supervised
to the introductory example, t may encode “Gay                        learning. Among these, Bar-Haim et al. (2017)
marriage is a great achievement.”, but t may also                     employ a support vector machine with multiple
encode “Gay marriage cannot be tolerated.”.3                          linguistic features, similar to those used in senti-
   Let A = {(c1 , P1 ), (c2 , P2 ), . . . , (cn , Pn )} be a          ment analysis. Iyyer et al. (2014) apply recursive
set of arguments related to t, then we are also given                 neural networks, Augenstein et al. (2016) use a
an (implicitly defined) function σ, called “stance”,                  bidirectional LSTM, and Chen et al. (2018) im-
which maps each argument A ∈ A either to pro                          plement a hybrid neural attention model. Unlike
or to con: σ encodes for which side of a contro-                      stance classification, the task we consider here does
versial issue an argument is devised. A pro argu-                     widely abstract from the topic on which stance is
ment supports t; likewise, a con argument attacks t.                  expressed.
Two arguments A1 and A2 have the same stance iff
σ(A1 ) = σ(A2 ).                                                      4   Dataset and Experiments
   Using these definitions, among others the follow-                  In the shared task we carried out, we have devised
ing decision problems can be stated. Given are a                      two types of same side stance classification experi-
thesis t and a set of related arguments A.                            ments: within a single topic and across two topics.
                                                                      The latter experiment type models the situation of
   • Πsameside . Decide for two arguments, A1 , A2
                                                                      a domain transfer and addresses the question of
     in A whether or not they have the same stance.
                                                                      topic-agnostic classification. As topics we chose
   • Πstance . Decide for an argument A in A                          “gay marriage” and “abortion”, and we sampled
     whether it has a pro or a con stance, i.e.,                      the respective argument datasets from the corpus
     whether σ(A) = pro or σ(A) = con.                                underlying the argument search engine args.me
                                                                      (Wachsmuth et al., 2017b). The following subsec-
   Algorithmic stance classification as treated here                  tions provide details about the dataset construction
means to learn the function σ from a set of exam-                     and the experiment setup.
ples.
   2
       Except for the trivial case where c ∈ P .
   3
       Given a thesis t we can consider its opposite as antithesis.
                                              Training                                Test
                 Class              Gay       Abortion        Σ             Gay     Abortion      Σ
                 Sameside          13 277      20 834       34 111           63         63        126
                 Diffside           9 786      20 006       29 792           63         63        126
                 Σ                 23 063      40 840       63 903          126        126        252

       Table 1: Number of argument pairs in the training sets and test sets of the within-topic experiments.


4.1    Dataset                                                   Class         Training: Abortion         Test: Gay
Because of its size and the balanced stance distri-              Sameside             31 195                3 028
bution, the args.me corpus provides a rich source                Diffside             29 853                3 028
for our experiments. At the time of the shared task              Σ                    61 048                6 056
the corpus consisted of 387 606 arguments that col-
lected from 59 637 debates; a detailed description           Table 2: Number of argument pairs in the training and
can be found in (Ajjour et al., 2019).4                      test set of the cross-topics experiment.
    An argument in args.me is modeled as a conclu-
sion along with a set of supporting premises. In
                                                             marriage” independently of each other. The train-
addition, each premise is labeled with a stance, in-
                                                             ing sets each contain 67% of the argument pairs of
dicating whether it is “pro” or “con” the conclusion.
                                                             one topic, which were randomly chosen. The test
The stances originate from the debates where the
                                                             sets were formed from the remaining 33% for the
arguments are used in. Debates can be started from
                                                             respective topic. Among others, it was ensured that
different viewpoints, for instance, a debate may
                                                             a label for an argument pair in the test set cannot be
discuss the viewpoint “abortion should be legal-
                                                             transitively deduced.5 Note in this regard that the
ized” while another may discuss “abortion should
                                                             “same side” relation forms an equivalence relation.
be banned”). Therefore, the stance of an argument
                                                             See Table 1 for the within-topic dataset statistics.
has to interpreted in relation to the arguments in the
same debate. During the acquisition process of the           Cross-Topics Experiment The cross-topics ex-
data for the shared task we followed this constraint         periment provides a different topic for training
by ensuring that the arguments of an argument pair           from the one for testing. In particular, the train-
stem always from the same debate.                            ing set contains argument pairs from the “abortion”
    The count of debates that treat “abortion” and           debates only, while the test set contains argument
“gay marriage” is 1567 and 712 respectively. We              pairs from “gay marriage” debates only. “Same-
filtered out those arguments whose premises are              side” pairs and “Diffside” pairs are balanced. See
shorter than four words since they are often meta            Table 2 for the Cross-Topics dataset statistics.
statements such as “I win” or “I accept”. As a
result, we kept 9426 arguments on abortion and               5       Submitted Systems and Results
4480 arguments on gay marriage for the task.                 Overall, nine teams participated in the first shared
4.2    Experiments                                           task on same side stance classification. This section
                                                             provides a brief overview of the systems that the
Starting from the arguments in a debate, we gener-
                                                             teams submitted, along with their results.
ated all possible argument pairs. An argument pair
was labeled as “Sameside” if both arguments are              Düsseldorf University The system submitted by
either “pro” or “con” the viewpoint of the debate,           Düsseldorf University relies on a Siamese network
otherwise the pair is labeled as “Diffside”. Pairs           trained to predict the similarity of two arguments
with identical arguments were removed.                       on top of a small BERT (Devlin et al., 2018). As
                                                             the maximum token length for BERT is 512 tokens,
Within-Topic Experiments The within-topic ex-
                                                             a relevance selection component to rank sentences
periments treat the two topics “Abortion” and “Gay
                                                             by relevance is integrated, cutting the ranked input
   4
   The entire args.me corpus can be accessed here: https:       5
//webis.de/data.html#args-me                                      With transitive deduction we mean: SameSide(A1 , A2 )
                                                             ∧ SameSide(A3 , A2 ) ` SameSide(A1 , A3 )
                                                   Within-Topic                                  Cross-Topics
                                  Gay                   Abortion                 All
  Team                     Pre    Rec    Acc     Pre      Rec      Acc    Pre    Rec    Acc    Pre    Rec    Acc
                †
  Trier University         0.90   0.73   0.83    0.79     0.59     0.71   0.85   0.66   0.77   0.73   0.72   0.73
  Leipzig University       0.80   0.78   0.79    0.78     0.68     0.75   0.79   0.73   0.77   0.72   0.72   0.72
  IBM Research             0.73   0.63   0.70    0.64     0.54     0.62   0.69   0.59   0.66   0.62   0.49   0.60
  TU Darmstadt             0.74   0.56   0.68    0.63     0.48     0.60   0.68   0.52   0.64   0.64   0.59   0.63
  Düsseldorf University    0.76   0.35   0.62    0.65     0.32     0.57   0.70   0.33   0.60   0.72   0.53   0.66
  Trier University†        0.64   0.25   0.64    0.67     0.22     0.56   0.65   0.24   0.56   0.70   0.11   0.53
  LMU                      0.53   1.00   0.55    0.53     1.00     0.55   0.53   1.00   0.55   0.67   0.53   0.63
  MLU Halle‡               0.54   0.57   0.54    0.53     0.57     0.53   0.53   0.57   0.54   0.50   0.57   0.50
  Paderborn University     0.55   0.17   0.52    0.62     0.21     0.54   0.59   0.19   0.53   0.60   0.38   0.56
  University of Potsdam    0.46   0.54   0.45    0.56     0.62     0.56   0.51   0.58   0.51   0.51   0.52   0.51
  MLU Halle‡               0.47   0.11   0.49    0.54     0.11     0.51   0.50   0.11   0.50   0.46   0.00   0.50

Table 3: The results of the submissions for the within-topic experiments and the cross-topics experiment in terms
of precision (Pre), recall (Rec), and accuracy (Acc). For both Trier University† and MLU Halle‡ , the best and the
worst result are reported since they submitted multiple systems.


at 512 tokens. The system achieved an accuracy               and edges are labeled with the confidence that the
of 60% on the within-topic task and 66% across               associated arguments agree with each other. This
topics.                                                      graph-based approach has the benefit that more
                                                             training data can be generated by a transitive clo-
IBM Research The system submitted by IBM
                                                             sure. Its accuracy was 55% in the within topic
is based on a small vanilla BERT model and has
                                                             setting and 63% in the cross-topic setting.
been first fine-tuned to perform standard binary
pro/con stance classification on data extracted from         MLU Halle The system submitted by the Martin-
the IBM Debater project. On top of this model,               Luther-University (MLU) of Halle-Wittenberg con-
another model is initialized and fine-tuned on the           sists of three system. The first system uses a tree-
same side classification task. The system obtained           based learning algorithm as classifier using stan-
results inverse to the ones of Düsseldorf University:        dard bag-of-words features. The second is a rule-
66% accuracy in the within-topic setting 60% in              based approach that reduces the task to sentiment
the cross-topics setting.                                    classification relying on rules defined over lists of
                                                             words with their polarity taken from a sentiment
Leipzig University The system submitted by
                                                             lexicon. The third is a re-implementation of the
Leipzig University uses a pre-trained BERT model
                                                             stance classification approach of Bar-Haim et al.
that is fine-tuned on the same side stance classifica-
                                                             (2017). The best system achieves an accuracy of
tion task. In addition, a binary classification layer
                                                             54% on the within-topics setting and 50% on the
with one output and cross entropy loss function
                                                             cross-topics setting.
is used instead of a multilabel classification layer.
To embed an argument, the first 254 tokens of an             Paderborn University The system submitted by
argument are fed through the BERT model. Then,               Paderborn University relies on a Siamese Neural
the last 254 tokens of an argument are embedded.             Network to map arguments to a new space where
The concatenation of both embeddings is fed into             arguments with the same stance are closer to each
the classification layer. The system achieved an             other, and other arguments are less close. Argu-
accuracy of 77% in the within-topic setting and              ments are represented by the contextual word em-
72% on the cross-topics setting.                             beddings provided by the Flair library (Akbik et al.,
                                                             2018). A final sigmoid activation function produces
LMU The system submitted by the Ludwig Max-
                                                             the output used for same side stance classification.
imilian University (LMU) relies on a vanilla pre-
                                                             The system achieved an accuracy of 53% within
trained BERT base model that is fine-tuned to the
                                                             topics and 56% across topics.
shared task. The data is organized in a graph with
one graph per topic. Nodes represent arguments,
Trier University The system submitted by Trier           proaches, such as encoding the beginning and end
University relies on a pre-trained BERT base model       of the arguments separately and then concatenat-
fine-tuned to the shared task. It was submitted          ing these encodings or implementing a relevance
with different configurations. The best yielded an       ranking system to encode only the most relevant
accuracy of 77% in the within-topics setting and         sentences of the argument. In any case, the encod-
73% on the cross-topics setting, the worst 56% and       ing strategy seems to have a clear impact on the
53% respectively.                                        results and thus deserves further investigation.
                                                            For related tasks, e.g. the ARCT, it has been
TU Darmstadt The system submitted by the TU
                                                         found recently that encoder-based models seem to
Darmstadt relies on a multi-task deep network on
                                                         pick up surface cues and artifacts of the dataset
the basis of the pre-trained large BERT model. The
                                                         and that they are not really able to learn a model
network is trained on a number of pro/con stance
                                                         that shows deeper understanding of how arguments
classification datasets in addition to the shared task
                                                         work. It is up to further investigation whether also
dataset. The system achieved an accuracy of 64%
                                                         the same side stance classification task bears the
in the within-topics setting and 63% in the cross-
                                                         potential for such artifacts that can be picked up
topic setting.
                                                         by system. It would be interesting to investigate
University of Potsdam The system submitted by            which task the encoder-based models actually learn
the University of Potsdam relies on bidirectional        to solve.
LSTMs to encode the arguments. The embeddings
of both arguments are concatenated, multiplied in
an element-wise fashion, substracted, and fed into
                                                         References
a two-layer MLP as a classification layer. The           Aseel Addawood, Jodi Schneider, and Masooda Bashir.
system achieved 51% accuracy both within and               2017. Stance classification of Twitter debates: The
                                                           encryption debate as a use case. In 8th International
across topics.                                             Conference on Social Media and Society, ACM In-
                                                           ternational Conference Proceeding Series. Associa-
6   Discussion and Outlook                                 tion for Computing Machinery.
The results of the shared task license a number          Yamen Ajjour, Henning Wachsmuth, Johannes Kiesel,
of interesting conclusions. First of all, the results      Martin Potthast, Matthias Hagen, and Benno Stein.
have validated our hypothesis that a topic-agnostic        2019. Data Acquisition for Argument Search: The
                                                           args.me corpus. In 42nd German Conference on Ar-
approach to same side stance classification is fea-        tificial Intelligence (KI 2019). Springer.
sible. This is clearly conveyed by the fact that the
within-topic and the cross-topics setting seem to        Alan Akbik, Duncan Blythe, and Roland Vollgraf.
                                                           2018. Contextual string embeddings for sequence
be of a similar complexity. Also, the differences in       labeling. In Proceedings of the 27th International
accuracy on both tasks are less than 5–6% points,          Conference on Computational Linguistics, pages
additionally corroborating the hypothesis.                 1638–1649, Santa Fe, New Mexico, USA. Associ-
   A second conclusion is that the effectiveness           ation for Computational Linguistics.
of most systems clearly improves over a random           Isabelle Augenstein, Tim Rocktäschel, Andreas Vla-
baseline, showing that the task is generally feasible.      chos, and Kalina Bontcheva. 2016. Stance Detection
At the same time, however, the results show that            with Bidirectional Conditional Encoding. In Pro-
there is potential for improvement.                         ceedings of the 2016 Conference on Empirical Meth-
                                                            ods in Natural Language Processing, pages 876–
   As for other tasks in the field of argumentation,        885. Association for Computational Linguistics.
such as the Argument Reasoning Comprehension
Task, ARCT (Habernal et al., 2018), encoder-based        Ramy Baly, Mitra Mohtarami, James Glass, Lluís
                                                           Màrquez, Alessandro Moschitti, and Preslav Nakov.
models seem to reach top results. In fact, all of the      2018. Integrating Stance Detection and Fact Check-
top-5 performing systems on our task (Trier Univer-        ing in a Unified Corpus. In Proceedings of the
sity, Leipzig University, IBM Research, TU Darm-           2018 Conference of the North American Chapter of
stadt, and Düsseldorf University) rely on a BERT           the Association for Computational Linguistics: Hu-
                                                           man Language Technologies, Volume 2 (Short Pa-
model. They differ mainly in the way the input
                                                           pers), pages 21–27. Association for Computational
is encoded. As the length of input arguments ex-           Linguistics.
ceeds the maximum input length for BERT models,
the participants explored and proposed different ap-     Roy Bar-Haim, Indrajit Bhattacharya, Francesco Din-
                                                           uzzo, Amrita Saha, and Noam Slonim. 2017. Stance
  Classification of Context-Dependent Claims. In Pro-     Saif M. Mohammad, Parinaz Sobhani, and Svetlana
  ceedings of the 15th Conference of the European           Kiritchenko. 2017. Stance and Sentiment in Tweets.
  Chapter of the Association for Computational Lin-         ACM Trans. Internet Technol., 17(3).
  guistics: Volume 1, Long Papers, pages 251–261.
  Association for Computational Linguistics.              Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami,
                                                           and James Glass. 2019. FAKTA: An Automatic End-
Peter Bourgonje, Julian Moreno Schneider, and Georg        to-End Fact Checking System. In Proceedings of
  Rehm. 2017. From Clickbait to Fake News Detec-           the 2019 Conference of the North American Chap-
  tion: An Approach based on Detecting the Stance          ter of the Association for Computational Linguis-
  of Headlines to Articles. In Proceedings of the          tics (Demonstrations), pages 78–83. Association for
  2017 EMNLP Workshop: Natural Language Pro-               Computational Linguistics.
  cessing meets Journalism, pages 84–89. Association
  for Computational Linguistics.                          Pavithra Rajendran, Danushka Bollegala, and Simon
                                                            Parsons. 2016. Contextual stance classification of
Di Chen, Jiachen Du, Lidong Bing, and Ruifeng               opinions: A step towards enthymeme reconstruction
  Xu. 2018. Hybrid Neural Attention for Agree-              in online reviews. In Proceedings of the Third Work-
  ment/Disagreement Inference in Online Debates. In         shop on Argument Mining (ArgMining2016), pages
  Proceedings of the 2018 Conference on Empirical           31–39. Association for Computational Linguistics.
  Methods in Natural Language Processing, pages
  665–670. Association for Computational Linguis-         Sarvesh Ranade, Rajeev Sangal, and Radhika Mamidi.
  tics.                                                     2013. Stance Classification in Online Debates by
                                                            Recognizing Users’ Intentions. In Proceedings of
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and               the SIGDIAL 2013 Conference, pages 61–69. Asso-
   Kristina Toutanova. 2018. Bert: Pre-training of deep     ciation for Computational Linguistics.
   bidirectional transformers for language understand-
   ing. arXiv preprint arXiv:1810.04805.                  Swapna Somasundaran and Janyce Wiebe. 2009. Rec-
                                                            ognizing Stances in Online Debates. In Proceed-
Ivan Habernal, Henning Wachsmuth, Iryna Gurevych,           ings of the Joint Conference of the 47th Annual
   and Benno Stein. 2018. SemEval-2018 task 12: The         Meeting of the ACL and the 4th International Joint
   argument reasoning comprehension task. In Pro-           Conference on Natural Language Processing of the
   ceedings of The 12th International Workshop on Se-       AFNLP, pages 226–234. Association for Computa-
   mantic Evaluation, pages 763–772, New Orleans,           tional Linguistics.
   Louisiana. Association for Computational Linguis-
   tics.                                                  Orith Toledo-Ronen, Roy Bar-Haim, and Noam
                                                            Slonim. 2016. Expert Stance Graphs for Computa-
Kazi Saidul Hasan and Vincent Ng. 2013. Stance              tional Argumentation. In Proceedings of the Third
  Classification of Ideological Debates: Data, Mod-         Workshop on Argument Mining (ArgMining2016),
  els, Features, and Constraints. In Proceedings of         pages 119–123. Association for Computational Lin-
  the Sixth International Joint Conference on Natural       guistics.
  Language Processing, pages 1348–1356. Asian Fed-
  eration of Natural Language Processing.                 David Vilares and Yulan He. 2017. Detecting Perspec-
                                                            tives in Political Debates. In Proceedings of the
Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and            2017 Conference on Empirical Methods in Natural
 Philip Resnik. 2014. Political Ideology Detection          Language Processing, pages 1573–1582. Associa-
 Using Recursive Neural Networks. In Proceedings            tion for Computational Linguistics.
 of the 52nd Annual Meeting of the Association for
 Computational Linguistics (Volume 1: Long Papers),       Henning Wachsmuth, Nona Naderi, Yufang Hou,
 pages 1113–1122. Association for Computational             Yonatan Bilu, Vinodkumar Prabhakaran, Tim Al-
 Linguistics.                                               berdingk Thijm, Graeme Hirst, and Benno Stein.
                                                            2017a. Computational argumentation quality assess-
G. Levow, V. Freeman, A. Hrynkevich, M. Ostendorf,          ment in natural language. In Proceedings of the 15th
  R. Wright, J. Chan, Y. Luan, and T. Tran. 2014.           Conference of the European Chapter of the Associa-
  Recognition of stance strength and polarity in spon-      tion for Computational Linguistics: Volume 1, Long
  taneous speech. In 2014 IEEE Spoken Language              Papers, pages 176–187. Association for Computa-
  Technology Workshop (SLT), pages 236–241.                 tional Linguistics.

Stefano Menini, Federico Nanni, Simone Paolo              Henning Wachsmuth, Martin Potthast, Khalid Al-
   Ponzetto, and Sara Tonelli. 2017. Topic-based agree-     Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu,
   ment and disagreement in us electoral manifestos.        Jonas Dorsch, Viorel Morari, Janek Bevendorff, and
   In Proceedings of the 2017 Conference on Empiri-         Benno Stein. 2017b. Building an argument search
   cal Methods in Natural Language Processing, pages        engine for the web. In Proceedings of the 4th Work-
   2938–2944. Association for Computational Linguis-        shop on Argument Mining, pages 49–59. Associa-
   tics.                                                    tion for Computational Linguistics.