<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Siamese Neural Network for Same Side Stance Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Milad Alshomary</string-name>
          <email>milad.alshomary@upb.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Henning Wachsmuth</string-name>
          <email>henningw@upb.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computational Social Science Group, Department of Computer Science, Paderborn University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Classifying the stance of an argument towards its target is an important step in many applications of computational argumentation. A simpler variant of stance classification was proposed as a shared task recently, called sameside stance classification: Given two arguments on the same topic, decided whether they have the same stance. In this paper, we present our approach to the shared task, exploring the potential of modeling same-side stance as a similarity learning task. For this purpose, we train a siamese neural network on pairs of arguments represented in an embedding space. In the two scenarios of the shared task, within topics and cross topics, our approach achieved an accuracy of 0.53 and 0.56 respectively.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In computational argumentation, stance
classification is the task of identifying the position of a claim
or a whole argument (usually either pro or con)
towards some target, such as a controversial topic
or another claim. Identifying the stance of a
natural language argument is a major step in argument
search engines (Wachsmuth et al., 2017), debating
technologies
        <xref ref-type="bibr" rid="ref3">(Bar-Haim et al., 2017)</xref>
        , and many
other downstream applications.
      </p>
      <p>The same-side stance classification task,1 a
simplified variant of stance classification, was
proposed as a shared task in the context of the RATIO
research program on argumentation,2 and its results
were presented at the 6th Workshop on Argument
Mining.3 The task is defined as:</p>
      <p>Given two arguments on the same topic,
classify whether the arguments have the
same stance towards the topic or not.
1Same-side task, https://sameside.webis.de
2RATIO, http://www.spp-ratio.de
3ArgMining, https://argmining19.webis.de
As suggested by the organizers, solving this task
does not require knowledge about the topic of the
argument, but focuses more on modeling features
of the argument pairs actually capturing stance,
thus making the task potentially easier. Still,
knowing whether two arguments are “on the same side”
helps in many downstream tasks, e.g., for
structuring discussions, for measuring the bias in a debate,
and for propagating a (known) stance of an
argument to other arguments.</p>
      <p>To approach same-side stance classification, a
dataset was provided in the shared task, where each
instance consists of two textual natural language
arguments from a debate portals, along with a text
covering the topic they address. Two experimental
set-ups were introduced: (1) within topics and (2)
cross topics. In the former, the training set and the
test set contain the same topics. In the latter, the
test topis are disjunct from the training topics.</p>
      <p>
        In this paper, we investigate the hypothesis that
arguments with the same stance are more lexically
and/or semantically similar than those with
different stance. In particular, we explore the potential
of similarity-learning approaches in addressing the
given task. To this end, we first represent each
argument in an embedding space derived from the
words they span. Then, we learn to map the
arguments to a new space where similar arguments
(having the same stance) are closer to each other, and
other arguments are further away. Concretely, we
represent each argument by document embedding
computed by the Flair library
        <xref ref-type="bibr" rid="ref1">(Akbik et al., 2018)</xref>
        ,
which is the average of the contextual string
embedding of each word in the argument. To learn
samestance similarity, we then employ a Siamese Neural
network
        <xref ref-type="bibr" rid="ref5">(Bromley et al., 1994)</xref>
        that is trained to
minimize the distance between positive pairs and
to maximize it for the negative ones.
      </p>
      <p>For both experimental set-ups, we evaluate our
approach by first tuning its parameters on the
validation set and then evaluating it on the test set.
Within topics, our approach achieved an accuracy
of 53% in the shared task, whereas it classified
56% of the cross topics test cases correctly. While
these values are rather in the middle of the task
leaderboard, our analysis provides insights into the
adequacy of siamese neural networks for the task.
It seems likely that using the provided topic
information and/or experimenting with different
embedding techniques would boost their effectiveness.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The detection of stance as pro or con (and
possibly none, neutral, or similar) is a crucial step in
many technologies related to computational
argumentation. Much research has been dedicated to
this task
        <xref ref-type="bibr" rid="ref1 ref19">(Stede and Schneider, 2018)</xref>
        . Among this,
Bar-Haim et al. (2017) tackled the classification
of the stance of a claim towards a topic, and
Persing and Ng (2016) constructed a dataset to study
stance detection on student essays. Also, Krejzl
and Steinberger (2016) proposed a SemEval task
where, given a tweet and a target phrase, the goal
was to identify the stance of the tweet towards this
target. Unlike these works, the paper at hand
focuses on the same-side stance classification task
where the actual stance does not matter, but only
whether two texts have the same stance.
      </p>
      <p>
        Basically, we seeks to learn a similarity function
that reflects the likelihood of two arguments having
the same stance. For this we first represent these
arguments in a semantic embedding space. A big
body of research has investigated different ways for
learning word embeddings, including
        <xref ref-type="bibr" rid="ref1 ref17 ref4">(Bojanowski
et al., 2017; Peters et al., 2018; Akbik et al., 2018)</xref>
        .
Representing sentences and larger units of text in
an embedding space is more complicated task.
Although there has been many approaches proposed
for this task, such as
        <xref ref-type="bibr" rid="ref13 ref2">(Kiros et al., 2015; Arora et al.,
2017)</xref>
        , simply taking the average embedding of the
sentence’s words has been proven to be a strong
baseline
        <xref ref-type="bibr" rid="ref8">(Conneau et al., 2018)</xref>
        .
      </p>
      <p>
        Given argument embeddings, we transfer them
to a new embedding space where arguments with
same stance are similar. To this end, we employ a
siamese neural network, which have been first
introduced by Bromley et al. (1994) to approach the
task of signature verification. Later, its architecture
has been utilized for metric learning in tasks such
as face verification
        <xref ref-type="bibr" rid="ref7">(Chopra et al., 2005)</xref>
        , visual
pattern recognition
        <xref ref-type="bibr" rid="ref11">(Hu et al., 2014)</xref>
        , and many
othpair of
arguments
... ... ... ... word embeddings
(Flair)
      </p>
      <p>... ... ... ...
argument embeddings</p>
      <p>(averages)
Two-layer neuSraialmneetsweork Two-layer
feendetfworowrkard with shared weights feendetfworowrkard
...
...</p>
      <p>transformed
embeddings
combined
embedding</p>
      <p>... (daibffseoreluntece)
sigmoid</p>
      <p>
        unit
similarity score in [0,1]
...
...
ers. In natural language processing, siamese neural
networks have been used, e.g., for learning
sentence similarity
        <xref ref-type="bibr" rid="ref14 ref15 ref16">(Mueller and Thyagarajan, 2016)</xref>
        and text categorization
        <xref ref-type="bibr" rid="ref18">(Shih et al., 2017)</xref>
        .
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>We hypothesized that arguments having the same
stance towards a given topic are usually more
similar semantically than arguments with opposite
stance. To model this similarity, we represent each
argument in an embedding space and then learn a
similarity function that reflects the likelihood of
having the same stance. Figure 1 gives an overview
of our approach, detailed in the following.</p>
      <p>
        Concretely, we map each argument to an
embedding using the contextual string embedding model
proposed by Akbik et al. (2018). The model
utilizes a character-level LSTM
        <xref ref-type="bibr" rid="ref10">(Graves, 2013)</xref>
        which
is trained to predict the next character given a
sequence of previous characters. The LSTM thus
generates for each character xt in a given string
a predictive distribution P (xtjx0; : : : ; xt 1),
encoded in the hidden state ht of the LSTM. Building
on this, Akbik et al. (2018) trained a bi-directional
LSTM model, which consists of two LSTMs that
process the string in forward (left-to-right) and in
a backward (right-to-left) manner. Thereby, each
character gets two hidden state representations, htf
and htb. Then, the embedding of a word thats spans
over x2 : : : xk is constructed by concatenating the
f
forward hidden state hk+1 after the last character
xk and the backward hidden state hb1 before the
first, x2. The embedding of the whole argument
text is obtained by averaging the embeddings of all
its words.
      </p>
      <p>Afterwards, we utilize a siamese neural network
to learn a similarity function over the encoded
arguments. The input of the neural network is pairs
of arguments encoded as vectors in the embedding
space and a label y indicating whether the two
arguments have same stance or not. The encoded
arguments are then passed through two feed-forward
neural networks that share their weights. An
absolute difference is computed from the two output
representations that is finally passed through one
layer with a single output and a sigmoid activation.
As a loss function L, we use binary cross entropy to
minimize the difference between predicted scores
(y^) and the true labels (y):</p>
      <p>L =
y log(y^) + (1
y) log(1
y^)
The idea behind is to make arguments with the
same stance as similar as possible and those with
opposite stance as dissimilar as possible.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>This section describes experiments with our
approach within the shared task as well as their
results.</p>
      <p>Implementation As mentioned above, we use
the contextual string embeddings of Akbik et al.
(2018). Specifically, we resort to the pretrained
model provided in the Flair library, which is trained
over news articles.4 We represent each argument as
a vector of 4096 dimensions. The siamese neural
network we employed is implemented as a two
feed-forward neural networks of two layers with
ReLU as an activation function. The two layers
share weights, resulting in an output vector of 128
4Flair, github.com/zalandoresearch/flair
System</p>
      <p>
        Prec. Rec. Acc. Prec. Rec. Acc.
dimensions. For the shared task, both models were
trained on batches of size 16 using Adam optimizer
        <xref ref-type="bibr" rid="ref11 ref12">(Kingma and Ba, 2014)</xref>
        .
      </p>
      <p>
        Training 63,903 training argument pairs on two
topics (“abortion” and “gay marriage”) were
provided by the task organizers. For the within topics
scenario, we randomly split the provided data into
a training set (44732 instances) and a validation
set (19171 instances). Then, we chose the model
with the best accuracy on the validation set. In the
cross topics scenario, we randomly sampled 1000
pairs of arguments on “gay marriage” for validation.
We trained our model on the provided training set
and chose the configuration that performed best on
the validation set. In particular, this configuration
achieved an accuracy of 0.72 in the within topics
scenario and 0.54 in the cross topics scenario.
Results Table 1 shows the final results of our
approach on the held-out test set in the shared task,
in comparison to all other participating systems.
We achieved an accuracy of 0.53 within topics,
and 0.56 in the cross topics scenario. All top
approaches on the leaderboard fine-tuned some
variant of BERT
        <xref ref-type="bibr" rid="ref9">(Devlin et al., 2019)</xref>
        on the task. A
similar approach to ours is HHU SSSC, which also
used a siamese neural network but with
embeddings generated by BERT. The high effectiveness
obtained just by using BERT suggests that also our
approach might benefit from integrating it.
      </p>
      <p>Looking at the accuracy drop within topics from
validation (0.79) to test (0.53), our approach seems
to have overfitted to the specific content of the
training arguments. Interestingly, its effectiveness on
across topics remains stable, indicating that the
siamese neural network does learn something
general to the task of same-side stance classification.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Same-side stance classification is a simplified
version of stance classification where the goal is to
classify whether two arguments on the same topic
have the same stance or not. In this paper, we have
presented the approach that we participated with in
the first same-side shared task. Our approach was
meant to explore the potential of modeling the task
as similarity learning using a siamese neural
network. The resulting model achieved 0.53 accuracy
in the within topics test set and 0.56 on the cross
topics test set, putting it roughly into the middle
of the leaderboard. Unlike us, the best systems all
utilized BERT embeddings.</p>
      <p>A follow-up work could study the integration
of siamese neural networks with embeddings such
as those from BERT. Besides, so far we refrained
from integrating the given topic into our approach
for simplicity. Making use of topic information to
solve the task may also be worth attempting.</p>
      <p>Adam:
CoRR,</p>
      <p>Henning Wachsmuth, Martin Potthast, Khalid
AlKhatib, Yamen Ajjour, Jana Puschmann, Jiani Qu,
Jonas Dorsch, Viorel Morari, Janek Bevendorff, and
Benno Stein. 2017. Building an argument search
engine for the web. In Proceedings of the 4th
Workshop on Argument Mining, pages 49–59,
Copenhagen, Denmark. Association for Computational
Linguistics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Alan</given-names>
            <surname>Akbik</surname>
          </string-name>
          , Duncan Blythe, and
          <string-name>
            <given-names>Roland</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Contextual string embeddings for sequence labeling</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>1638</fpage>
          -
          <lpage>1649</lpage>
          ,
          <string-name>
            <given-names>Santa</given-names>
            <surname>Fe</surname>
          </string-name>
          , New Mexico, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Sanjeev</given-names>
            <surname>Arora</surname>
          </string-name>
          , Yingyu Liang, and Tengyu Ma.
          <year>2017</year>
          .
          <article-title>A simple but tough-to-beat baseline for sentence embeddings</article-title>
          .
          <source>5th International Conference on Learning Representations, ICLR</source>
          <year>2017</year>
          ; Conference date:
          <fpage>24</fpage>
          -
          <lpage>04</lpage>
          -2017 Through 26-
          <fpage>04</fpage>
          -
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Roy</given-names>
            <surname>Bar-Haim</surname>
          </string-name>
          , Lilach Edelstein, Charles Jochim, and
          <string-name>
            <given-names>Noam</given-names>
            <surname>Slonim</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Improving claim stance classification with lexical knowledge expansion and context utilization</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Argument Mining</source>
          , pages
          <fpage>32</fpage>
          -
          <lpage>38</lpage>
          , Copenhagen, Denmark. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>5</volume>
          :
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Jane</given-names>
            <surname>Bromley</surname>
          </string-name>
          , Isabelle Guyon,
          <string-name>
            <surname>Yann</surname>
            <given-names>LeCun</given-names>
          </string-name>
          , Eduard Säckinger, and
          <string-name>
            <given-names>Roopak</given-names>
            <surname>Shah</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Signature verification using a “siamese” time delay neural network</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>737</fpage>
          -
          <lpage>744</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Sumit</given-names>
            <surname>Chopra</surname>
          </string-name>
          , Raia Hadsell,
          <string-name>
            <surname>Yann LeCun</surname>
          </string-name>
          , et al.
          <year>2005</year>
          .
          <article-title>Learning a similarity metric discriminatively, with application to face verification</article-title>
          .
          <source>In CVPR (1)</source>
          , pages
          <fpage>539</fpage>
          -
          <lpage>546</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Conneau</surname>
          </string-name>
          , Germán Kruszewski, Guillaume Lample, Loïc Barrault, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>What you can cram into a single vector: Probing sentence embeddings for linguistic properties</article-title>
          .
          <source>In ACL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In NAACL-HLT.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Alex</given-names>
            <surname>Graves</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Generating sequences with recurrent neural networks</article-title>
          .
          <source>ArXiv, abs/1308</source>
          .0850.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Junlin</given-names>
            <surname>Hu</surname>
          </string-name>
          , Jiwen Lu, and
          <string-name>
            <surname>Yap-Peng Tan</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Discriminative deep metric learning for face verification in the wild</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pages
          <fpage>1875</fpage>
          -
          <lpage>1882</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Diederik P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A method for stochastic optimization</article-title>
          .
          <source>abs/1412</source>
          .6980.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Ryan</given-names>
            <surname>Kiros</surname>
          </string-name>
          , Yukun Zhu, Russ R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and
          <string-name>
            <given-names>Sanja</given-names>
            <surname>Fidler</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Skip-thought vectors</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3294</fpage>
          -
          <lpage>3302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Peter</given-names>
            <surname>Krejzl</surname>
          </string-name>
          and
          <string-name>
            <given-names>Josef</given-names>
            <surname>Steinberger</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Uwb at semeval-2016 task 6: stance detection</article-title>
          .
          <source>In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)</source>
          , pages
          <fpage>408</fpage>
          -
          <lpage>412</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Jonas</given-names>
            <surname>Mueller</surname>
          </string-name>
          and
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Thyagarajan</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Siamese recurrent architectures for learning sentence similarity</article-title>
          .
          <source>In Thirtieth AAAI Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Isaac</given-names>
            <surname>Persing</surname>
          </string-name>
          and
          <string-name>
            <given-names>Vincent</given-names>
            <surname>Ng</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Modeling stance in student essays</article-title>
          .
          <source>In ACL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Matthew E.</given-names>
            <surname>Peters</surname>
          </string-name>
          , Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In Proc. of NAACL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Chin-Hong</surname>
            <given-names>Shih</given-names>
          </string-name>
          , Bi-Cheng Yan,
          <string-name>
            <surname>Shih-Hung Liu</surname>
          </string-name>
          , and Berlin Chen.
          <year>2017</year>
          .
          <article-title>Investigating siamese lstm networks for text categorization</article-title>
          .
          <source>In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)</source>
          , pages
          <fpage>641</fpage>
          -
          <lpage>646</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Manfred</given-names>
            <surname>Stede</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jodi</given-names>
            <surname>Schneider</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <string-name>
            <given-names>Argumentation</given-names>
            <surname>Mining</surname>
          </string-name>
          .
          <source>Number 40 in Synthesis Lectures on Human Language Technologies</source>
          . Morgan &amp; Claypool.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>