<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transfer Learning with Sentence Embeddings for Argumentative Evidence Classi cation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alma Mater Studiorum - University of Bologna</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bologna</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>monica.palmirani</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>davide.liga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>g@unibo.it</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Luxembourg</institution>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
      </contrib-group>
      <fpage>11</fpage>
      <lpage>20</lpage>
      <abstract>
        <p>This work describes a simple Transfer Learning methodology aiming at discriminating evidences related to Argumentation Schemes using three di erent pre-trained neural architectures. Although Transfer Learning techniques are increasingly gaining momentum, the number of Transfer Learning works in the eld of Argumentation Mining is relatively little and, to the best of our knowledge, no attempt has been performed towards the speci c direction of discriminating evidences related to Argumentation Schemes. The research question of this paper is whether Transfer Learning can discriminate Argumentation Schemes' components, a crucial yet rarely explored task in Argumentation Mining. Results show that, even with small amount of data, classi ers trained on sentence embeddings extracted from pre-trained transformers can achieve encouraging scores, outperforming previous results on evidence classi cation.</p>
      </abstract>
      <kwd-group>
        <kwd>Argumentation Mining tion Schemes</kwd>
        <kwd>Transfer Learning</kwd>
        <kwd>Argumenta-</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the last few years, the use of Transfer Learning methodologies generated in
remarkable hype in the State of the Art of many Natural Language Processing
tasks. Particularly, the Transformer known as \Bidirectional Encoder
Representations from Transformer" (BERT) has shown extremely good results,
establishing several new records in terms of metrics results [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In 2018, BERT obtained
new state-of-the-art results on eleven NLP-related tasks. In a couple of years
dozens of variants have been developed, establishing other new records not just
in English but also in other languages (e.g., the Italian versions, GilBERTo3 and
umBERTo4, the French camemBERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).
      </p>
      <p>
        Despite the high celebrity recently achieved by Transfer Learning techniques,
these methodologies have been applied relatively few times in Argumentation
Mining [
        <xref ref-type="bibr" rid="ref12 ref14">12, 14</xref>
        ]. To the best of our knowledge, this is the rst work that
explicitly assesses Transfer Learning performances with the aim of discriminating
argumentative components related to Argument Schemes [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. On the one side,
the approach show to be capable of discriminating argumentative stances of
support and opposition related to some famous argumentative patterns
(Argumentation Schemes) such as Argument from Expert Opinion, and Argument from
negative consequences, showing better results compared to previous studies. On
the other side, the approach show that it is possible clustering Argumentation
Schemes according to the criteria of the pragmatical dimension, which is a
crucial aspect described in the most recent literature about Argumentation Scheme
classi cation [
        <xref ref-type="bibr" rid="ref10 ref6">10, 6</xref>
        ]. In summary, the approach show an ability to classify
argumentative evidences not only at ne-grained levels (e.g., di erent instances of
Argument from Expert Opinion) but also at the level of large clusters (like the
Argumentation Schemes coming from an external source, a class which
according to some classi cation approaches can be used as rst dichotomic criterion of
discrimination among schemes [
        <xref ref-type="bibr" rid="ref10 ref6">10, 6</xref>
        ]).
      </p>
      <p>Section 2 will describe the Transfer Learning methodology and the two main
settings for the experiments. Section 3 will describe the datasets used for the
experiments in the two scenarios. Sections 4 and 5 will show the experimental
results on the two scenarios. Section 6 will describe the related works. In Section
7, some nal considerations will conclude the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>Transfer Learning methods are generally divided in two approaches: the rst
approach is called ne-tuning and it consists of using a pre-trained neural
architecture (i.e., a Transformer architecture trained on thousands of inputs) as
a starting point to perform further training steps on a downstream task
(training, thus, the neural architecture on downstream data). The second approach,
instead, is that of using a pre-trained neural architecture just to extract the
outputs that the Neural Architecture generate for a given input at a speci c stage
of the neural architecture. For example, a sentence can be used as input and
the output generated by the neural architecture can be extracted and used as
sentence embeddings, that can represent our sentence in other downstream tasks
(noticeably, the extraction of the generated output to be used as embedding can
be performed at di erent stages of the neural architecture, not necessarily at
the nal layer). In this paper, the second approach will be employed: a famous
pre-trained architecture will be selected, some sentences will be used as inputs
for this neural architecture, and the output coming from the neural architecture
will be employed as sentence embeddings to represent our data in a series of
downstream classi cation tasks.</p>
      <p>
        For the pre-trained embeddings we will employ three pre-trained models:
the rst one is the famous neural transformer called BERT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (speci cally, we
will use the uncased base version). The second and third models are two recent
models which are derived from BERT, namely: distilBERT[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and RoBERTa[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
(uncased). While BERT base consists of 12 layers, 768 hidden dimensions, 12
self-attention heads and nearly 110M parameters, RoBERTa base consists of 12
layers, 768 hidden dimensions, 12 self-attention heads and 125M parameters.
Finally, distilBERT consists of 6 layers, 768 hidden dimensions, 12 self-attention
heads and 66M parameters.
      </p>
      <p>To extract the embeddings from the neural models, each input sentence must
be rstly tokenized according to the requirements of the given model. Typically,
with BERT, a [CLS] and a [SEP] special tokens are inserted at the beginning
and at the end of the input (we are interested in the rst one which is the token
holding the classi cation output we are interested to extract from the input
sentence). Moreover, the length of each input sentence is set to a max length:
all sentences longer than that limit are shortened, while all sentences shorter
than that limit are padded with the special [PAD] token. This process makes
sure that all inputs have the same length before entering the neural architecture.
After the tokenization, inputs are passed into the neural architecture of a BERT
transformer, while deactivating the calculation of gradients.</p>
      <p>After having transformed each input sentence of the test sets into tokens and
having used these tokens as inputs for the BERT neural architecture, the
resulting extracted embeddings have been used, in turn, as input of a classi cation
using two classi cation procedure: a Support Vector Machine (SVM) classi er
and a Logistic Regression classi er (LRC). Notice that for the experiment on D3
our SVM employed a Linear Support Vector Classi er (Linear SVC), while in
all other experiments we employed a standard Support Vector Classi er (SVC).</p>
      <p>The classi cation method is One vs All. Which means that the classi cation
has been performed per each class, considering one class against all the other
classes, a typical approach in multiclassi cation and multilabel scenarios. Finally,
all classi ers have been evaluated on the relative testing set.</p>
      <p>The experiments have been divided into di erent scenarios:
1. Baseline scenario: in this scenario, the classi cation was performed on the
same setting of two previous works, taken as baselines for comparison.
2. Extended scenario: in this scenario, the classi cation was performed on new
settings, using an extended version of two datasets from the baseline scenario.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data</title>
      <p>The experiments of this work have been applied to the datasets listed in Table 1,
reporting reports also the number of instances for each dataset. These datasets
have been selected because their annotations describe classes of argumentative
evidence directly related to speci c Argumentation Schemes. Importantly, during
the experiments, all datasets have been split into train and test sets, following
a standard 80/20 ratio.</p>
      <p>
        Regarding the baseline scenario, D1 and D2 are a portion of Al Khatib et al.
2016 and Aharoni et al. 2014 respectively, two important dataset designed by
IBM. Only two classes from the original datasets have been selected, reproducing
the scenario in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] in order to have baseline scenarios for our classi ers. D3
is a small dataset (only 638 sentences) from Liga and Palmirani 2019. It is a
dataset which has di erent levels of granularity, depending on how many classes
are considered. In this case we selected granularity three, which contains three
labels.
      </p>
      <p>
        Regarding the extended scenario, the dataset D1+ is an extension of D1:
instead of extracting just two classes, it considers three classes. The inputs of
the dataset from Al Khatib et al. 2016 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are actually structured in a very
fragmented way, so we needed to rebuild the sentences following the approach
suggested in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Similarly, D2+ is an extension of D2 (instead of being a selection
of just two classes, it considers three classes). Finally, D2++ is an extended
version of the same dataset which, having many more instances, can be a useful
benchmark for this kind of classi cations.
      </p>
      <p>
        Importantly, the datasets which have been employed in this work are among
the few available datasets containing instances of argumentative evidences which
can be related to Argumentation Schemes. Namely, the dataset in Al Khatib
et al. 2016 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] shows instances of argumentative evidences labelled as Study,
Testimony and Anecdotal: these evidences support argumentative claims which
refer to source-based opinions, this means that they belong to di erent types of
source-based arguments. One of the most famous example of source-based
Argumentation Scheme is the well-known Argument from Expert Opinion; another
famous scheme is the Argument from witness testimony (more details about this
kind of schemes can be found in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
      </p>
      <p>
        The datasets in Aharoni et al. 2014 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Rinott et al. 2015 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] present
similar source-based Argumentation Schemes (however, this time the labels are
Study, Expert and Anecdotal). In this case, the cluster of argumentative
evidences labelled with the class Expert are likely to be compatible with the
evidences of an Argumentation Scheme from Expert Opinion.
      </p>
      <p>
        The dataset in Liga and Palmirani 2019 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] o ers instead only one class of
evidences which is related to source-based arguments (Testimony) while another
class is related to a cluster of evidences which can be related to the Argument
from Negative Consequences and the Slippery Slope Arguments.
      </p>
      <p>These three datasets can thus be used to assess whether classi ers are able
to discriminate between di erent cluster of argumentative evidences. Since these
argumentative evidences are strictly related to speci c clusters of Argumentation
Schemes, the ability of classi ers to discriminate di erent clusters of
argumentative evidences is, in our opinion, a crucial step towards Argumentation Scheme
discrimination.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results for the Baseline Scenario</title>
      <p>
        The classi cations in this Section show that the proposed approach is able to
outperform recent results in the Argumentation Mining literature. With this
purpose, recent results on D1, D2 and D3 are reported [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] and used as baseline
for our classi ers.
      </p>
      <p>SBVerMtBLaRse SDVisMtilBELRRT SRVoBMERLTRa BS</p>
      <p>In this paper, all F1 scores per class are calculated as the mean macro F1
scores, taken from each One-vs-All classi cation. All these scores are nally
averaged and reported as mean F1 (per each classi er, i.e. SVM and LR).</p>
      <p>As can be seen from Table 2, results outperform previous results for the
same scenario, showing the ability of Transfer Learning techniques to achieve
high performances. As indicated by the bold numbers in Table 4, for D1, D2
and D3 there are always at least four classi ers out of six which outperform the
baseline.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Result for the Extended Scenario</title>
      <p>The next series of experiments have been performed on an extended version of
two datasets from the baseline scenario (D1 and D2), to assess how performances
change in a multiclass scenario.</p>
      <p>Regarding the classi cations on D1+, one can see that the best performances
are achieved by the Logistic Regression classi er (LR) trained on sentence
embeddings extracted using DistilBERT. To have a better understanding of these
results, the confusion matrix of the best classi er in this scenario (i.e., Logistic
Regression on DistilBERT) are reported with the confusion matrix from the best
classi er of the baseline scenario (i.e., Support Vector Machine from DistilBERT
embeddings from Table 2) in Figure 1.</p>
      <p>D1 (study vs testimony)</p>
      <p>SVM on DistilBERT</p>
      <p>D1+ (study vs others) D1+ (testimony vs others)</p>
      <p>LR on DistilBERT LR on DistilBERT</p>
      <p>Regarding the classi cations on D2+ and D2++, one can see that the best
performances are achieved by the Logistic Regression classi er (LR) trained on
sentence embeddings extracted using DistilBERT and Bert Base. Also in this
case, to have a better understanding of the results, the confusion matrices of
the best classi ers in this scenario (i.e., Logistic Regression from DistilBERT
embeddings and from Bert Base) are reported with the confusion matrix from the
best classi er of the baseline scenario (i.e., Logistic Regression from RoBERTa
embeddings from Table 2) in Figure 2.</p>
      <p>Notice that while confusion matrices for D1 and D2 (in green) show a binary
classi cation, the other confusion matrices in blue (relative to D1+, D2+ and
D2++) show a one-vs-all classi cation. These blue matrices show that classi ers
are able to recognize classes also in a multiclass scenario. While Figure 1 shows
an unbalance (which is probably due to the predominance of the class anecdotal),
results in Figure 2 seems more balanced: the diagonal is always a 30/60 ratio,
indicating the goodness of predictions.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Related works</title>
      <p>
        Unfortunately, datasets speci cally designed in a way that allow a direct link
between classes and speci c Argumentation Schemes are very few. A promising
and growing resource, in this sense, is the corpora in AIFdb [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] thanks also to
the contribute of tools like OVA+ [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] which recently added a very important
component for Argumentation Scheme annotation called Argument Scheme Key
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
D2 (study vs expert)
      </p>
      <p>LR on RoBERTa</p>
      <p>D2+ (study vs others)</p>
      <p>LR on DistilBERT</p>
      <p>D2+ (expert vs others)</p>
      <p>LR on DistilBERT
D2++ (study vs others) D2++ (expert vs others)</p>
      <p>LR on DistilBERT LR on DistilBERT
D2++ (study vs others) D2++ (expert vs others)</p>
      <p>LR on Bert Base LR on Bert Base</p>
      <p>Moreover, although there have been di erent works of text classi cation in
Argumentation Mining, only few studies focused on classi cation tasks aiming
at facilitating the discrimination of Argumentation Schemes.</p>
      <p>
        Rinott et al. 2015 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] achieved important results on evidence detection
employing the dataset D2++. However, the approach is mostly context-dependent,
while the present work is not considering the context. In Liga 2019 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the
classi cation has been performed using Tree Kernels classi ers on D1 and D2,
containing argumentative evidences of support among which it is possible to nd
evidences directly related to the Argument from Expert Opinion. The work is
however limited to a binary classi cation. A similar approach, in a multiclass
scenario, is described in Liga and Palmirani 2019 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], where Tree Kernels are
employed on D3, a small dataset which considers argumentative evidences of
opposition among which one can nd, for example, the Slippery Slope Argument.
Considering these two works as baselines, the approach presented in this paper
seems capable of outperforming the previous achievements.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>The datasets analyzed in this work are composed of argumentative evidences
which are directly related to di erent clusters of arguments. For example, many
instances which can be found in the datasets of this paper are directly related
to the cluster of source-based arguments. Other instances of argumentative
evidences are instead speci cally related to the Argumentation Scheme from Expert
Opinion, while others are related to the cluster which includes the Argument
from Negative Consequences and the Slippery Slope Arguments (which do not
belong to the cluster of source-based arguments).</p>
      <p>
        We believe that the ability to discriminate di erent clusters of argumentative
evidences is a crucial step in the classi cation of Argumentation Schemes. For
example, the discrimination of clusters of Argumentation Schemes can be
performed in a pipeline of binary classi cations starting from source-based versus
non-source-based arguments and continuing towards more speci c binary
classi cations (similarly to the path of dichotomous choices followed by ASK, the
annotation system recently elaborated in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which o ers a valuable system of
classi cation of Argumentation Schemes).
      </p>
      <p>In general, the results presented in this paper seem encouraging, showing
that pre-trained embeddings can outperform previous results in the eld of
Argumentation Mining related to the classi cation of argumentative evidences.
An interesting aspect is that the proposed classi ers show encouraging results
not only in the discrimination among di erent source-based argumentative
evidences, but also in classi cations involving source-based versus non-source-based
argumentative evidences (i.e. with dataset D3).</p>
      <p>
        However, further analysis is needed to verify if and how Transfer Learning
techniques can discriminate argumentative evidences in such a way that they
can facilitate Argumentation Scheme discrimination. In this regard, the present
paper is just a preliminary exploration of a promising possible approach. In
future works, other Transfer Learning techniques should be assessed too. For
example, it could be useful to assess the performances between the two main
Transfer Learning techniques: sentence embeddings and ne-tuning. Also, other
pre-trained models should be employed and compared (e.g., Xlnet[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], Albert[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]).
      </p>
      <p>A long-term goal is being able to connect natural language argumentative
evidences to their speci c Argumentation Schemes, which can be a further step
in the development of an arti cial Natural Argumentation Understanding.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aharoni</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polnarov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hershcovich</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Rinott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Gutfreund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Slonim</surname>
          </string-name>
          , N.:
          <article-title>A benchmark dataset for automatic detection of claims and evidence in the context of controversial topics</article-title>
          .
          <source>In: Proceedings of the First Workshop on Argumentation Mining</source>
          . pp.
          <volume>64</volume>
          {
          <issue>68</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Al</given-names>
            <surname>Khatib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Kiesel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>A news editorial corpus for mining argumentation strategies</article-title>
          .
          <source>In: Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: Technical Papers</source>
          . pp.
          <volume>3433</volume>
          {
          <issue>3443</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimpel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soricut</surname>
          </string-name>
          , R.: Albert:
          <article-title>A lite bert for self-supervised learning of language representations</article-title>
          . arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>11942</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Argument mining: A survey</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>45</volume>
          (
          <issue>4</issue>
          ),
          <volume>765</volume>
          {
          <fpage>818</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Visser</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>An online annotation assistant for argument schemes</article-title>
          .
          <source>In: Proceedings of the 13th Linguistic Annotation Workshop</source>
          . pp.
          <volume>100</volume>
          {
          <fpage>107</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Liga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Argumentative evidences classi cation and argument scheme detection using tree kernels</article-title>
          .
          <source>In: Proceedings of the 6th Workshop on Argument Mining</source>
          . pp.
          <volume>92</volume>
          {
          <issue>97</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Liga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmirani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Detecting \slippery slope" and other argumentative stances of opposition using tree kernels in monologic discourse</article-title>
          .
          <source>In: International Joint Conference on Rules and Reasoning</source>
          . pp.
          <volume>180</volume>
          {
          <fpage>189</fpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Macagno</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Argumentation schemes. history, classi cations, and computational applications</article-title>
          . History, Classi cations,
          <source>and Computational Applications (December</source>
          <volume>23</volume>
          ,
          <year>2017</year>
          ). Macagno,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Walton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            &amp;
            <surname>Reed</surname>
          </string-name>
          , C pp.
          <volume>2493</volume>
          {
          <issue>2556</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suarez</surname>
            ,
            <given-names>P.J.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dupont</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romary</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>de la Clergerie</surname>
            ,
            <given-names>E.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seddah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagot</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Camembert: a tasty french language model</article-title>
          . arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>03894</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Niven</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kao</surname>
          </string-name>
          , H.Y.:
          <article-title>Probing neural network comprehension of natural language arguments</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <volume>4658</volume>
          {
          <issue>4664</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>REED</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.J.J.L.C.</surname>
          </string-name>
          <article-title>: Ova+: An argument analysis interface</article-title>
          .
          <source>In: Computational Models of Argument: Proceedings of COMMA</source>
          . vol.
          <volume>266</volume>
          , p.
          <volume>463</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Reimers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daxenberger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stab</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Classi cation and clustering of arguments with contextualized word embeddings</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <volume>567</volume>
          {
          <issue>578</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Rinott</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dankin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alzate</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khapra</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aharoni</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slonim</surname>
          </string-name>
          , N.:
          <article-title>Show me your evidence-an automatic method for context dependent evidence detection</article-title>
          .
          <source>In: Proceedings of the 2015 conference on empirical methods in natural language processing</source>
          . pp.
          <volume>440</volume>
          {
          <issue>450</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sanh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debut</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaumond</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>01108</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Walton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macagno</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Argumentation schemes</article-title>
          . Cambridge University Press (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Carbonell, J.,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>5754</volume>
          {
          <issue>5764</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>