<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Alexander Bondarenko1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ekaterina Shirshakova</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Niklas Homann</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Hagen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin-Luther-Universita¨t Halle-Wittenberg</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>alexander.bondarenko@informatik.uni-halle.de</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>We describe the ACQuA team's participation in the “Same Side Stance Classification” shared task (are two given arguments both on the pro or con side for some topic?) that was run as part of the ArgMining 2019 workshop.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In recent years, the popularity of social media and
discussion platforms has lead to online pro and
con argumentation on almost every topic. Still,
since not all contributions in such online
discussions clearly indicate their stance or polarity,
automatically identifying some post’s stance could help
readers quickly get an overview of a discussion
similar to debating portals with pro/con arguments.</p>
      <p>In this extended abstract, we report on our
participation at the “Same Side Stance Classification”
shared task. The task was run as a pilot at the
ArgMining 2019 workshop and stated the
problem as: given two arguments, decide whether
either both support or both attack some controversial
topic like gay marriage—i.e., whether the two
arguments are “on the same side.”</p>
      <p>Given that the available time prior to the pilot
edition of the shared task was rather limited, we
decided to focus our research interest simply on
examining the effectiveness of simple word n-gram
features and various variants of sentiment detection
for same side classification. We experiment with
three respective classifiers: (1) a simple rule-based
method “counting” positive and negative terms,
(2) a rule-based method with sentiment flipping
that uses sentiment and shifter lexicons, and (3) a
gradient boosting decision tree-based method using
word n-gram as features.</p>
      <p>Not too surprisingly, the evaluation results for
the three classifiers show that relying on sentiment
words or word n-grams alone cannot really solve
stance classification. Our “best” models achieve an
accuracy of 0.54 on binary-labeled balanced test
sets—obviously only a very slight improvement
over random guessing.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Stance classification has been studied in
numerous research publications proposing different
features. For instance, Walker et al. (2012) analyzed
11 feature types and showed that Na¨ıve Bayes
using POS tags achieved better results than word
unigrams, while HaCohen-Kerner et al. (2017) applied
an SVM classifier on 18 feature types extracted
from tweets (hashtags, slang and emojis, POS tags,
character and word n-grams, etc.) and reported
good performance for character skip n-grams.
Nevertheless, word n-grams have been a very common
choice in many stance classification experiments.</p>
      <p>Also common for stance classification is the
utilization of sentiment attributes. For
instance, Somasundaran and Wiebe (2010) combined
argumentation-based features (1- to 3-grams
extracted from sentiments and argument targets) with
sentiment-based features (sentiment lexicon with
negative and positive words).</p>
      <p>Comparing different classification models, Liu
et al. (2016) in their evaluation showed gradient
boosting decision trees to outperform SVMs for
stance classification. More recently, neural
approaches have been successfully applied to stance
classification: Popat et al. (2019) tuned BERT with
hidden state representations, and Durmus et al.
(2019) used BERT fine-tuned with path
information extracted from argument trees for 741 topics
from kialo.com.</p>
      <p>Given the limited time prior to the shared task,
we simply wanted to test word n-grams (gradient
boosting tree-based classifier) and sentiment
features (rule-based classifiers) as common feature
types for stance classification.</p>
    </sec>
    <sec id="sec-3">
      <title>Task and Data</title>
      <p>The “Same Side Stance Classification” shared task
has two experimental settings: within-topic
(argumentative topics for training and test are the same)
and cross-topic (argumentative topics for training
and test are different).</p>
      <p>The provided data are argumentative topics and
corresponding pairs of arguments collected from
the debating portals idebate.org, debatepedia.org,
debatewise.org, and debate.org. The data is split
into training (within-topic: 63,903 argument pairs
for the two topics abortion and gay marriage,
crosstopic: 61,048 argument pairs for the topic
abortion) and test sets (within-topic: 31,475 argument
pairs for the two topics abortion and gay marriage,
cross-topic: 6,163 argument pairs for the topic
gay marriage). We randomly split the provided
training sets into local training, validation and test
sets (80:10:10).
4</p>
    </sec>
    <sec id="sec-4">
      <title>ACQuA Runs</title>
      <p>Our three runs1 are based (1) on a rule-based
classifier, (2) on a rule-based classifier with sentiment
flipping, and (3) on gradient boosting decision
trees.</p>
      <sec id="sec-4-1">
        <title>Rule-based classification Argument stances can</title>
        <p>either support or attack some argumentative topic.
In other words, they can convey a positive or a
negative “sentiment” towards the topic. Since the shared
task is topic-agnostic (i.e., there is no need to
distinguish topic-specific argumentation vocabulary),
our first run only tries to identify whether a pair of
arguments expresses the same sentiment. So far, a
plethora of approaches have been proposed to
classify sentiment of opinions as positive or negative
(or neutral), but given the time constraints of task
participation we decided to investigate whether
sentiment signals in the simplest form of lexicon-based
counts of positive or negative terms can contribute
to same side classification.</p>
        <p>Employing the Hu and Liu (2004)’s sentiment
lexicon, we use sentiment marker keyword lists for
sentiment detection (e.g., good vs. bad). Depending
on whether the positive or the negative markers
have a higher total count, the rule-based classifier
assigns the respective label to the argument—note
that sentiment flipping terms (e.g., not bad) are not
part of our first run.</p>
        <p>1Code available at: https://github.com/
webis-de/argmining19-acqua-same-side/
In case that the counts of positive and negative
markers are equal or if an argument does not
contain any marker, a random label is assigned. This is
the case for about 25% of the provided within-topic
and about 20% of the provided cross-topic training
pairs (12% of within and about 19% for cross-topic
test pairs).</p>
        <p>Finally, if the counter-based sentiments of an
argument pair agree, they are classified as “same
side.”</p>
      </sec>
      <sec id="sec-4-2">
        <title>Rule-based classification with sentiment flip</title>
        <p>ping We re-implemented a sentiment classifier
which is one step in a three-step approach to
classify a single claim’s stance as pro or con with
respect to some controversial topic proposed by
BarHaim et al. (2017). A complete approach
combines argument target identification with sentiment
detection and consistency/contrastiveness
classification. In a semester-long student project, we
re-implemented parts of this approach and verified
that it produces results similar to the originally
reported performances.</p>
        <p>In the setting of the “Same Side Stance
Classification” shared task, we applied only the sentiment
classifier, which follows the approach by Ding
et al. (2008) and uses the sentiment words counts
matched with the lexicon of Hu and Liu (2004)(the
same that is used in our first approach) and the
shifter lexicon of Polanyi and Zaenen (2006)
(sentiment shifters flip the polarity of sentiment words).
We could not directly apply the target identifier and
the contrast classifier due to differences in semantic
structures of the IBM and Same Side datasets.</p>
        <p>In case that the counts of positive and negative
sentiments are equal or if an argument does not
contain any sentiments, a label that arguments are
on the same side is assigned (this reflects the
majority label in the IBM dataset). This is the case
for about 4% of the provided within-topic and
about 0.3% of the provided cross-topic pairs in
the official test set.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Gradient boosting decision tree In our third</title>
        <p>
          run, we use the fast gradient boosting framework
LightGBM
          <xref ref-type="bibr" rid="ref7">(Ke et al., 2017)</xref>
          that employs
treebased learning algorithms. LightGBM is often
used for text classification tasks, even in one of
the winning approaches in the Kaggle
competition on identifying duplicate Quora questions
          <xref ref-type="bibr" rid="ref6">(Iyer
et al., 2017)</xref>
          . We use token frequencies and
tf-idfweighted bags of 1-, 2-, 3-, 1–2-, and 1–3-gram
lemmas as features (often used in text classification
tasks).
        </p>
        <p>As LightGBM returns a confidence for
predictions, we run preliminary experiments with
different thresholds on our local training and
validation sets to select the best performing parameters.
The following features and thresholds achieved the
highest accuracy in these pilot experiments:
tf-idfweighted unigram lemmas and a confidence
threshold of 0.520 for the within-topic setup and of 0.501
for the cross-topic setup.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments and Results</title>
      <p>We use our local training, validation, and test
sets (80/10/10) to train, validate, and test the
LightGBM-based classifier and only test the two
rule-based classifiers (they do not have training
step) locally (classification accuracies on the
local test set given in Table 1). The simple
rulebased and LightGBM approaches perform only
very slightly better than a random guessing
informed about the balanced data (50:50 same /
different side). One possible reason for the rule-based
classifier without flipping probably is that about
25% of the cases were randomly decided due to
ties in the numbers of positive/negative terms.
Surprisingly, considering sentiment flipping only
worsened the performance. In case of the LightGBM
approach, probably simple word n-gram lemmas are
still not sufficient as features for a stance
classification decision tree.</p>
      <p>Even though our approaches performed very
poorly on the local data, we submitted all three
approaches with their best parameter settings as runs
for the shared task. To this end, the
LightGBMbased approach was trained on the full official
training set.</p>
      <p>The accuracies for all our three runs as reported
by the task organizers are shown in Table 2. Not
too surprisingly, also on the official test set, the
performance of the rule-based approaches and of
the LightGBM-based approach does not really
improve upon an informed random guessing (50:50
label balance). Note that the rules’ without flipping
slightly better performance on the official test set
compared to our local test set might be due to the
fewer random decisions in case of ties for the
numbers of positive/negative dictionary words (12% vs.
25%).
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We have submitted three approaches to the shared
task on same side stance classification (i.e.,
deciding whether two arguments are “on the same
side” for a given topic): (1) a simple rule-based
sentiment-oriented approach, (2) a rule-based
sentiment classifier with flipping, and (3) gradient
boosted decision trees with tf-idf-weighted
unigram lemmas as features.</p>
      <p>All our runs do not really improve upon an
informed random guessing. Sentiment in the
simplistic form of our rule-based models does not seem to
help too much in same side classification.</p>
      <p>A proper adaptation of the complete IBM
Research’s stance classifier to the Same Side
classification task and training classifiers over word
embeddings including deployment of the neural
classifiers are interesting directions for future
research.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by the
Deutsche Forschungsgemeinschaft (DFG) within
the project “Answering Comparative Questions
with Arguments (ACQuA)” (grant HA 5851/2-1)
that is part of the Priority Program “Robust
Argumentation Machines (RATIO)” (SPP-1999).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Roy</given-names>
            <surname>Bar-Haim</surname>
          </string-name>
          , Indrajit Bhattacharya, Francesco Dinuzzo, Amrita Saha, and
          <string-name>
            <given-names>Noam</given-names>
            <surname>Slonim</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Stance Classification of Context-Dependent Claim</article-title>
          .
          <source>In Proceedings of ACL 2017</source>
          , pages
          <fpage>251</fpage>
          -
          <lpage>261</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Xiaowen</given-names>
            <surname>Ding</surname>
          </string-name>
          , Bing Liu, and
          <string-name>
            <surname>Philip</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>A Holistic Lexicon-Based Approach to Opinion Mining</article-title>
          .
          <source>In Proceedings of WSDM 2008</source>
          , pages
          <fpage>231</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Esin</given-names>
            <surname>Durmus</surname>
          </string-name>
          , Faisal Ladhak, and
          <string-name>
            <given-names>Claire</given-names>
            <surname>Cardie</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Determining Relative Argument Specificity and Stance for Complex Argumentative Structures</article-title>
          .
          <source>In Proceedings of ACL 2019</source>
          , pages
          <fpage>4630</fpage>
          -
          <lpage>4641</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Yaakov HaCohen-Kerner</surname>
            ,
            <given-names>Ziv</given-names>
          </string-name>
          <string-name>
            <surname>Ido</surname>
          </string-name>
          , and Ronen Ya'
          <fpage>akobov</fpage>
          .
          <year>2017</year>
          .
          <article-title>Stance Classification of Tweets using Skip Char Ngrams</article-title>
          .
          <source>In Proceedings of ECML PKDD</source>
          <year>2017</year>
          , pages
          <fpage>266</fpage>
          -
          <lpage>278</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Minqing</given-names>
            <surname>Hu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bing</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Mining and Summarizing Customer Reviews</article-title>
          .
          <source>In Proceedings of SIGKDD 2004</source>
          , pages
          <fpage>168</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Shankar</given-names>
            <surname>Iyer</surname>
          </string-name>
          , Nikhil Dandekar, and Korne`l Csernai.
          <year>2017</year>
          . First Quora Dataset Release: Question Pairs.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Guolin</given-names>
            <surname>Ke</surname>
          </string-name>
          , Qi Meng, Thomas Finley, Taifeng Wang,
          <string-name>
            <surname>Wei</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Weidong Ma, Qiwei Ye, and
          <string-name>
            <surname>Tie-Yan Liu</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>LightGBM: A Highly Efficient Gradient Boosting Decision Tree</article-title>
          .
          <source>In Proceedings of NIPS 2017</source>
          , pages
          <fpage>3146</fpage>
          -
          <lpage>3154</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Can</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Wen</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bradford</given-names>
            <surname>Demarest</surname>
          </string-name>
          , Yue Chen, Sara Couture, Daniel Dakota, Nikita Haduong, Noah Kaufman, Andrew Lamont, Manan Pancholi,
          <string-name>
            <given-names>Kenneth</given-names>
            <surname>Steimel</surname>
          </string-name>
          , and Sandra Ku¨bler.
          <year>2016</year>
          . IUCL at SemEval
          <article-title>-2016 Task 6: An Ensemble Model for Stance Detection in Twitter</article-title>
          .
          <source>In Proceedings of SemEval-2016</source>
          , pages
          <fpage>394</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Livia</given-names>
            <surname>Polanyi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Annie</given-names>
            <surname>Zaenen</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Contextual Valence Shifters</article-title>
          .
          <source>In Computing Attitude and Affect in Text: Theory and Applications</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Kashyap</given-names>
            <surname>Popat</surname>
          </string-name>
          , Subhabrata Mukherjee,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Yates</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>STANCY: Stance Classification Based on Consistency Cues</article-title>
          .
          <source>In Proceedings of EMNLP-IJCNLP</source>
          <year>2019</year>
          , pages
          <fpage>6412</fpage>
          -
          <lpage>6417</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Swapna</given-names>
            <surname>Somasundaran</surname>
          </string-name>
          and
          <string-name>
            <given-names>Janyce</given-names>
            <surname>Wiebe</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Recognizing Stances in Ideological Online Debates</article-title>
          .
          <source>In Proceedings of the Workshop CAAGET at NAACL HLT</source>
          <year>2010</year>
          , pages
          <fpage>116</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Marilyn A.</given-names>
            <surname>Walker</surname>
          </string-name>
          , Pranav Anand, Rob Abbott,
          <string-name>
            <given-names>Jean E.</given-names>
            <surname>Fox</surname>
          </string-name>
          <string-name>
            <surname>Tree</surname>
          </string-name>
          , Craig Martelly, and
          <string-name>
            <given-names>Joseph</given-names>
            <surname>King</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>That is Your Evidence?: Classifying Stance in Online Political Debate</article-title>
          .
          <source>Decision Support Systems</source>
          ,
          <volume>53</volume>
          (
          <issue>4</issue>
          ):
          <fpage>719</fpage>
          -
          <lpage>729</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>