<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Models for Argumentative Reasoning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luke Thorburn</string-name>
          <email>luke.thorburn@kcl.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ariel Kruger</string-name>
          <email>ariel.kruger@unimelb.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hunt Lab, University of Melbourne</institution>
          ,
          <addr-line>Parkville, Victoria 3010</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>King's College London</institution>
          ,
          <addr-line>Strand, London WC2R 2LS</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Large transformer-based causal language models are capable of strong performance on many natural language processing tasks. Here, we systematically evaluate the performance of the 2.7 billion parameter GPT Neo pre-trained language model on 6 argumentative reasoning tasks under 5 diferent optimization strategies, including prompt programming, soft prompts, and parameter tuning. We report both intrinsic evaluation metrics (perplexity), and extrinsic measures of the coherence of model outputs, as judged by an expert human rater. With a few exceptions, the rate at which models produced coherent responses ranged from 15-50%. In contrast, human performance (users of the Kialo argument mapping platform) ranged from 65-82% coherent, depending on the task. These results suggest that larger, suitably optimized language models may be capable of supporting authors and auditors of natural language argument maps in human-in-the-loop settings. We share our finetuned models and code.</p>
      </abstract>
      <kwd-group>
        <kwd>language model</kwd>
        <kwd>argument generation</kwd>
        <kwd>finetuning</kwd>
        <kwd>soft prompt</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Since Douglas Engelbart first envisaged software for authoring structured argumentation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
the goal of an algorithm that can automatically check human reasoning—a “spell checker
for logic”—has been discussed. More broadly, knowledge workers of many types (including
academics, risk analysts, and intelligence analysts) stand to benefit from tools that help them
reliably and eficiently construct coherent arguments. One approach to realizing such tools is
to integrate automated reasoning algorithms with argument mapping software, an approach
that we have taken recently1.
      </p>
      <p>There are many specific argumentative tasks that it may be useful to automate, such as the
generation of reasons and objections, the identification of unstated premises, and the process of
“tightening up” an argument by rewording premises to best represent the most argumentatively
appropriate claim. In resource-constrained settings, it may only be possible to automate these
Wales
https://lukethorburn.com/ (L. Thorburn)</p>
      <p>
        © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
numerous tasks if there is a common, accessible method that can be applied to all of them. In
this paper, we investigate the extent to which an open-source pre-trained language model, the
2.7 billion parameter version of GPT Neo [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], can be optimized to perform such argumentative
reasoning.
      </p>
      <sec id="sec-1-1">
        <title>1.1. Background</title>
        <p>
          Large transformer-based causal language models are capable of strong performance on many
natural language processing (NLP) tasks [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ]. This flexibility is made possible by the generality
of causal language modeling—the task of predicting what text comes next, conditioned on
the text that has come before. Any task that can be articulated as a natural language prompt
followed by a response can be posed to a causal language model. For this reason, pre-trained
language models can in some cases serve as few-shot or zero-shot learners [
          <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">5, 6, 7, 8</xref>
          ] by
including instructions or examples of the task as part of the prompt [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Performance can often
be improved further by tuning some or all of the model weights [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ].
        </p>
        <p>
          Previous academic work has investigated whether language models can emulate diferent
types of logical reasoning. For example, El Baf et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] use language models to synthesize
arguments for or against a given claim by training a language model over a vocabulary of
curated argumentative discourse units. Clark et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] find that when facts and rules are
presented in natural language, a language model can reason over a knowledge base with high
accuracy. Gurcke et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] evaluate whether premises are suficient to draw a conclusion by
comparing the stated conclusion with one generated by a language model. Skitalinskaya et
al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] tune a language model to evaluate the quality of argumentative claims. Other work
has investigated the ability of language models to identify logical fallacies [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Increasingly,
commercial prototypes are also demonstrating the potential for language models to assist with
human reasoning tasks in a human-in-the-loop manner. A prominent example is Elicit [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ],
an “AI research assistant” powered by OpenAI’s GPT-3 language model that includes tools to
brainstorm counterarguments, increase the specificity of claims, and suggest antecedents and
consequences to expand a partial chain of reasoning.
        </p>
        <p>
          Without providing a comprehensive review, we note that there are other approaches to
automating natural language reasoning that do not rely solely on language models. One
prominent example is Project Debator [
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ], an attempt to build an autonomous agent that
can compete in formal debate.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Contribution</title>
        <p>
          In this project, we systematically explored the performance of the GPT Neo pre-trained language
model on 6 argumentative reasoning tasks, under 5 diferent optimization strategies. The tasks,
described in Section 2.2, correspond to tasks commonly performed by an analyst in the course
of authoring an argument map. We report both intrinsic evaluation metrics (perplexity), and
extrinsic measures of the coherence of model outputs, as judged by a human rater. To our
knowledge, this is the first systematic evaluation of a large ( &gt;109 parameter) language model
on argumentative reasoning tasks, despite the success of such large models elsewhere in NLP
[
          <xref ref-type="bibr" rid="ref19 ref5">5, 19</xref>
          ]. Our results form a baseline for future work, and provide insight into which optimization
strategies are most successful. In addition, we are releasing the finetuned models, along with
code for performing optimization and inference, to aid future research2.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Inputs</title>
      <p>In this section we describe the pre-trained foundation model we used, the NLP tasks for which
we optimized it, and the datasets used for tuning.</p>
      <sec id="sec-2-1">
        <title>2.1. Foundation model</title>
        <p>
          As our foundation model we used the 2.7 billion parameter version of the open-source GPT
Neo language model [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. GPT Neo has a transformer-based decoder architecture designed to
replicate that of GPT-3 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], albeit with fewer and smaller layers than are implemented in the
largest version of GPT-3. For details on the architecture of GPT Neo, we direct the reader to the
original papers on the GPT series of models [
          <xref ref-type="bibr" rid="ref20 ref21 ref5">20, 21, 5</xref>
          ]. GPT Neo was pre-trained on ‘The Pile’,
an 800GB corpus of diverse text collated for language modeling [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
2.2. Tasks
We investigated six argumentative NLP tasks, which are described in Table 1.
        </p>
        <p>Tasks 1-5 could be described as a type of “masked argument modeling” or cloze completion.
They take as inputs a small argument map—potentially a subset of a much larger map—with one
or more claims missing (strictly one in all cases except s u g g e s t - i n t e r m e d i a r y - c l a i m s , where
arbitrarily many may be missing). The task is then to generate a claim or claims that could
coherently fill the gap. The five tasks difer in the type and position of the claim that has been
masked (whether it is a reason, co-premise, conclusion, etc.).</p>
        <p>
          All tasks investigated are generative, and intended to aid an analyst as they construct an
argument map by (a) improving the eficiency with which they can compose relevant claims, and
(b) prompting them to consider counter-arguments or implicit assumptions that they may not
otherwise have identified. The optimized models are intended to be integrated with argument
mapping tools where a hi-tree data structure [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] can be assumed. This avoids the need to
rely on an imperfect argument mining pipeline to extract such a structure from argumentation
presented in prose.
        </p>
        <p>
          Task 6, s u g g e s t - a b s t r a c t i o n , corresponds to a common step in the process of refining an
ill-formed argument map. Often a premise will be too specific in relation to the target claim to
best characterize the nature of the logical relationship between them. In such circumstances,
the analyst should revise the claim to describe a more general or abstract inferential principle,
which is the task we try to automate. Consider the following example, taken from [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>N u c l e a r p o w e r i s g e n e r a t e d b y t h e b r e a k d o w n o f h e a v y e l e m e n t s .</p>
        <p>⟹ N u c l e a r p o w e r h a s v e r y l o w g r e e n h o u s e g a s e m i s s i o n s .
2The code, along with details of how to download the models, can be found at https://github.com/Hunt-Laboratory/
language-model-optimization.</p>
        <sec id="sec-2-1-1">
          <title>Given a starting claim (a reason) and an end claim, suggest an expanded sequence of claims containing intermediary inferences between the start claim and the end claim.</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Given a claim and one or more co-premises, suggest additional co-premises required for the inference to be valid.</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>Given a claim and a reason, suggest a more abstract version of the reason that better represents the logical relationship between the reason and the claim.</title>
          <p>This argument is valid in a vague sense, but the premise is overly narrow and does not well
represent the core reason it supports the conclusion. A better version would be:
T h e p h y s i c s o f n u c l e a r p o w e r g e n e r a t i o n i n v o l v e s n o c o m b u s t i o n .</p>
          <p>⟹</p>
          <p>N u c l e a r p o w e r h a s v e r y l o w g r e e n h o u s e g a s e m i s s i o n s .</p>
          <p>It is this specific type of revision that the s u g g e s t - a b s t r a c t i o n task is intended to perform,
under the assumption that such an abstraction is required.</p>
          <p>To formulate all of these tasks so they can be presented to a language model, we format them
as a text prompt, the completion of which constitutes a response to the task. For example, the
s u g g e s t - r e a s o n s task might be formulated as follows.</p>
          <p>
            L i s t r e a s o n s w h y : &lt; T A R G E T C L A I M &gt;
R e a s o n s :
* &lt; R E A S O N 1 &gt;
* &lt; R E A S O N 2 &gt;
* &lt; R E A S O N 3 &gt;
*
The model must then generate &lt; R E A S O N 4 &gt; to complete the prompt3.
3The prompt templates used can be found at https://github.com/Hunt-Laboratory/language-model-optimization.
2.3. Data
Training, validation, and test datasets for each task, where possible, were generated from a
collection of argument maps scraped from the online collaborative argument mapping platform
Kialo4. The scrape was performed in January 2020 by Lenz et al. [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ], and the data was provided
to us on request. The dataset contains 180,736 claims across 563 argument maps, centered on
contentions such as “Traditional bullfighting ... should be banned”, “Climate change can be
reversed”, and “Darwinian evolution is philosophy not science.” The maps have the structure of
a simple argumentation framework: a tree where vertices are claims and edges denote pro or
con relations. The maps underwent several preprocessing steps, the most noteworthy of which
are summarized below.
          </p>
          <p>• Maps were randomly assigned to training (60%), validation (20%), and test (20%) sets.
• Claims falling above a certain distance from the root claim in each map were filtered
out because qualitative exploration suggested that the quality and coherence of claims
decreased the further you go from the root claim. These claims have likely received
less scrutiny on Kialo because they are less salient in the user interface. The depth of
truncation difers slightly for each task, and is specified in Table 2.
• All forks (a claim with its children) and branches (a sequence of supporting claims from a
leaf to the root of a tree) were extracted for each map.
• Depending on the task, the forks and branches were randomly or deterministically
subsetted further to generate a greater number of training, validation, and test examples.
For example, if a fork contained multiple reasons, any subset of those reasons could be
included in a prompt for the s u g g e s t - r e a s o n s task, and any one of those reasons could
be held out to serve as the “correct” completion for the purposes of supervised training
and evaluation. This subsetting process leads to a combinatorial explosion in the number
of candidate examples, so random selection was used to limit the size of the dataset,
where resource constraints required it. At most, the training sets were limited to 50,000
examples, and the validation and test sets to 10,000 examples.</p>
          <p>The final number of examples in each of the dataset splits for each task are provided in Table 2.
Note that for both the s u g g e s t - c o p r e m i s e and s u g g e s t - a b s t r a c t i o n tasks, there is no training
or validation data available because Kialo does not support co-premises or tag revisions of
claims that could be considered abstractions. For the same reasons, the test set for these tasks is
unlabelled, so performance on this task can only be evaluated by human raters.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Optimization</title>
      <p>4See https://www.kialo.com/.</p>
      <p>In this section we describe the strategies, software and hardware used for optimizing the
foundation model.</p>
      <sec id="sec-3-1">
        <title>3.1. Strategies</title>
        <p>There is a rapidly growing literature on strategies for optimizing pre-trained language models
to perform specific tasks.</p>
        <p>
          Prompt programming refers to one family of methods, in which the pre-trained model weights
are taken as fixed but the structure of the text prompt is tweaked to improve the quality of
the output [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. Exploration of diferent prompt formats can be systematic, but there is an
art to crafting better prompts, guided by heuristics. So-called zero-shot prompts only contain
task-specific instructions and the details of the specific instance of the task being performed. In
contrast, few-shot prompts contain additional complete examples of the task being performed.
In one context, the performance gains aforded by the inclusion of an additional example within
a few-shot prompt have been observed to be roughly equivalent to tuning the full model on 100
examples [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. In other contexts, zero-shot prompts significantly outperform few-shot prompts
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. When using few-shot prompts, systematic experimentation can help determine which
examples to include in the prompt [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], and in what order to list them [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. Another prompt
programming strategy is to formulate tasks in a common template, such as question answering
[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] or textual entailment [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          In a related approach, a short sequence of additional tokens with randomly initialized
embeddings (known as a soft prompt ) is prepended to a minimal input prompt containing the details
of the specific instance of the task to be performed. The embeddings for these additional tokens
are tuned in a supervised fashion while all other model parameters remain fixed [
          <xref ref-type="bibr" rid="ref31 ref32 ref33">31, 32, 33</xref>
          ].
In this way, the “wording” of the prompt can be continuously optimized using conventional
gradient-based optimization methods.
        </p>
        <p>
          There are a number of proposed strategies for selectively tuning a subset of the parameters
in the main body of the model, short of tuning the full model. For example, tuning only the bias
parameters can be both computationally eficient and efective [
          <xref ref-type="bibr" rid="ref34 ref9">34, 9</xref>
          ]. Alternately, meta-tuning
describes the approach in which the foundation model is first tuned for a general task such as
instruction following or question answering, before being applied to specific tasks of interest
[
          <xref ref-type="bibr" rid="ref30 ref6">30, 6</xref>
          ].
        </p>
        <p>Given our focus on exploring the applicability of pre-trained language models across multiple
argumentative reasoning tasks—particularly in data-limited settings—we selected 5 optimization
strategies that we evaluated (data and funding permitting) for each of the 6 tasks. Informed by
the above literature, the strategies evaluated were:
• zero-shot prompt, no tuning
• few-shot prompt, no tuning
• soft prompt + zero-shot prompt, only the soft prompt tuned
• zero-shot prompt, only bias parameters tuned
• zero-shot prompt, all parameters tuned</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Software</title>
        <p>
          Model tuning, evaluation, and text generation were performed in Python using PyTorch [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]
via the Hugging Face transformers library [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ], with some custom class extensions to account
for bespoke data loading, logging of evaluation metrics, and the insertion of soft prompts.
We used the WarmupLR learning rate scheduler with the AdamW optimizer, a batch size of
32, and continued tuning until the validation loss (evaluated relatively infrequently due to
computational cost) was observed to increase. The code for creating, tuning, and generating
text in the presence of soft prompts was adapted from Parker [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ].
        </p>
        <p>
          Training large neural networks while avoiding out-of-memory errors can be challenging.
To manage parallel and eficient training and evaluation of GPT-Neo on limited hardware we
used the Zero Redundancy Optimizer (ZeRO, stage 2), as implemented in the DeepSpeed library
[
          <xref ref-type="bibr" rid="ref38">38</xref>
          ]. Within this framework, optimizer states and model weights are partitioned across parallel
processes such that each process updates only its partition, and retains only the gradients
corresponding to its portion of the optimizer states, whilst also ofloading optimizer memory
and computation to the CPU. This avoided out-of-memory errors and allowed training to be
performed.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Hardware</title>
        <p>Training and evaluation were performed remotely on a commercial virtual machine with four
NVIDIA Quadro RTX 6000 GPUs, twenty-four AMD EPYC 7502P (2.50 GHz) virtual CPUs, and
2.78TB of storage. In total, the virtual machine had 96GB of virtual RAM (24GB per GPU), and
184 GB of conventional RAM. Total cloud compute costs were about USD 1250, and the tuning
for all tasks and strategies took place over 228 hours.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>In this section we describe the methods used to evaluate each optimization strategy on each
task, along with our results.</p>
      <sec id="sec-4-1">
        <title>4.1. Methods</title>
        <p>Where possible, we evaluated each application of an optimization strategy to a task using both
automated (intrinsic) and manual (extrinsic) methods. The inclusion of manual evaluation was
to provide more interpretable insight into the quality of the model outputs, and to allow us
to evaluate model outputs on unsupervised tasks where only prompts—not responses—were
available (namely, s u g g e s t - c o p r e m i s e and s u g g e s t - a b s t r a c t i o n ).
4.1.1. Automated
For each task that included responses in the test set, we calculated the perplexity of each
approach across all examples in the test set. Perplexity is a standard evaluation measure for
language models in the supervised setting and is equal to the exponential of the average
crossentropy across tokens in the evaluation text, relative to the model. The lower the perplexity of
a sequence of tokens, the greater the likelihood the model assigns to that sequence. Perplexity
is not a measure of reasoning quality specifically, but in this context captures how likely the
model was to generate the “correct”, human-written argumentative claims that form the gold
standard responses to the prompts.</p>
        <p>We calculated perplexity in two ways: (a) using the mean cross-entropy across all tokens (in
both the prompts and responses) and (b) using the mean cross-entropy across only the response
tokens (though the prompts were still fed through the model to allow it to condition on them).
Perplexity across all the tokens is substantially lower because they contain repetitive boilerplate
(which could be memorized during tuning) and, further, this measure corresponds to the loss
function on which the models were tuned, where such tuning occurred. That said, the second
perplexity measure (considering only the response tokens) is perhaps more meaningful, given
that we ultimately care about generating unknown outputs and not regenerating the input
prompts.
4.1.2. Manual
Manual evaluation was performed using a bespoke method and rubric. We randomly sampled
100 examples from the test set for each task. Then, under all optimization strategies for each task,
we generated sample responses of length 150 tokens for each of these 100 examples, conditioned
on the appropriately formatted prompt. The temperature used for generation was 0.9. These
generated outputs were cleaned according to the following rules.</p>
        <p>• Rounded parentheses and their contents were removed, along with asterisks, underscores,
backticks, any leading numbered list indices (e.g. “1 . ”), trailing whitespace, and all
characters before the first letter, number, or quotation mark.
• The text was split into lines, and lines into sentences. Only the first line was retained and,
unless the task was s u g g e s t - i n t e r m e d i a r y - c l a i m s , only the first sentence on the first line.</p>
        <p>The first letter was capitalized.
• If the task was s u g g e s t - i n t e r m e d i a r y - c l a i m s , the string was split on the claim delimiter
(either “ ~ ” or “ = &gt; ”), and the first and last claims were removed on the assumption that</p>
        <p>Suggestion (as written) is not relevant or coherent, and there is no insight to be
gained from it.</p>
        <p>Suggestion (as written) is not relevant or coherent, but the suggestion prompts
the user to think of adjacent ideas or suggestions that are relevant and coherent.</p>
        <p>Suggestion (as written) is relevant and coherent, but some editing is required
to be usable.</p>
        <p>Suggestion (as written) is relevant and coherent, and would be usable as written.</p>
        <p>they had been correctly reproduced from the input prompt.</p>
        <p>The generated outputs were pooled with the human-generated claims from Kialo (to provide
a human benchmark) and sorted according to randomly assigned IDs for each test example,
such that tasks and strategies appeared in a random order, but outputs for each example task
appeared consecutively. The example tasks and responses were presented to raters in this order
(to reduce cognitive switching costs), and formatted as small argument maps using a custom
interface5, in which the claims to be rated were highlighted. Raters were blind to the source of
the highlighted claims.</p>
        <p>Each generated output was rated for coherence, using the rubric in Table 3. In this context, a
claim is understood to be coherent (either “Coherent−” or “Coherent+”) if it is (a) able to be
understood, and (b) is logically consistent with neighboring claims, in the manner implied by
its position in the argument map. Note, claims can be coherent without being true6. We chose
to evaluate coherence because it arguably represents the minimum requirement for generated
model outputs to be useful to a human analyst, and is a dimension of quality against which all
six tasks can be evaluated.</p>
        <p>The rubric and the rating interface were developed over two pilot rating rounds. Each model
output was rated once by one of two raters (the authors of this study), both of whom are familiar
with argument mapping conventions.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>
          In all training runs, learning was successful with an initial rapid decrease in loss, followed by a
plateauing of the loss functions on both the training and validation sets. Severe overfitting (e.g.
a U-shaped validation loss curve) was not seen within the training durations observed. As more
parameters were made available for tuning, fewer examples were needed to reach the point of
(mild) overfitting. For example, when tuning all parameters in the model, the model started
to overfit less than 10% of the way through the first epoch. In contrast, the soft prompt and
bias-only tuning strategies took at least one full epoch to reach the point of overfitting.
5A variant of the interface at https://luke-thorburn.github.io/argument-processor/.
6We originally included truth as a second dimension in the rating rubric. However, in practice the truth status of
most model suggestions was either not able to be assessed (because they were normative or nonsensical), or was
not verifiable in the time available.
4.2.1. Automated
The results of the automated evaluation are shown in Table 4. Across all strategy/task
combinations studied, full text perplexity ranges between about 5 and 25. For reference, GPT-Neo-2.7B
achieves a perplexity of 5.646 on its test set from The Pile [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The zero-shot, no tuning strategy
consistently performed the most poorly, whilst the soft prompt strategy produced the lowest
perplexities observed across all tasks.
        </p>
        <p>When considering perplexity calculated only using the response tokens, the picture changes.
In general, perplexity values are much higher here due to the exclusion of repetitive prompt
boilerplate. The soft prompt strategy still performs competitively, but it is the few-shot strategy
with no tuning that produced the lowest perplexity across all tasks. In contrast, the strategies
with greater numbers of parameters tuned often performed more poorly, especially so in the
case of the bias-only tuning strategy. This may be because they overfit to the prompt boilerplate
at an earlier checkpoint, but this was not noticed in the training metrics because they only
included the full text (both prompt and response).
4.2.2. Manual
The results of the manual evaluation are shown in Table 5. In general, the picture emerging
from the manual evaluations is less clear than that of the automated evaluations, with diferent
optimization strategies performing best on diferent tasks. This may in part be due to the
relatively small sample of 100 examples rated, as well as noise due to the imperfect reliability of
the rating scale.</p>
        <p>With a few exceptions, the rate at which models produced coherent responses ranged roughly
between 15% and 50%, which may be acceptable in a human-in-the-loop setting where multiple
suggestions can be generated concurrently and those that are incoherent quickly discarded.
Notably, the relatively more subtle tasks s u g g e s t - a b s t r a c t i o n and s u g g e s t - c o p r e m i s e achieved
coherence rates of 18% and 25% using merely a few-shot prompt and no parameter tuning.</p>
        <p>We also rated the original human outputs from Kialo, where available, to provide a benchmark.
Whilst better than all the language model outputs, the human benchmark is relatively low, not
rising above 82% coherence. This reinforces the dificulty of the tasks (at least when posed to
crowdsourced teams) and also raises questions about the quality of Kialo as a source of training
data.</p>
        <p>Figure 1 includes two examples of generated model outputs for the s u g g e s t - o b j e c t i o n s task.
The complete set of generated examples along with their ratings can be found in the project
GitHub repository7.</p>
        <p>
          From one perspective, coherence is a low bar. A suggestion can be coherent without being
new, true, important, or eloquent. On the other hand, coherence is a significant milestone,
revealing an ability to abide by conventional rules of logical argumentation. The observed
coherence rates were achieved in a model that is at least two orders of magnitude smaller
than commercial models that are state-of-the-art [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ], with limited exploration of the space of
optimization regimes. It is foreseeable that with larger language models and more dedicated
7Available at https://github.com/Hunt-Laboratory/language-model-optimization.
        </p>
        <sec id="sec-4-2-1">
          <title>Objections:</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Claim:</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>Objections:</title>
          <p>C u l t u r a l a p p r o p r i a t i o n i s w r o n g .</p>
          <p>(a) Example of model output that was rated “Coherent +”.</p>
          <p>P r i v a t e s c h o o l s p r e s e r v e t r a d i t i o n s t h a t a r e a b s e n t , o r o t h e r w i s e i m p r a c t i c a l t o
m a i n t a i n , i n t h e s t a t e s y s t e m .
• G i v e n t h i s h a s n e v e r b e e n t r i e d , a n d n o e x a m p l e s a r e g i v e n , t h e r e i s n o r e a s o n t o
a s s u m e t h e s e t r a d i t i o n s c a n n o t b e m o v e d t o a p u b l i c s y s t e m .
• N o t a l l o f t h e s e t r a d i t i o n s a r e g o o d , a n d m a n y c a n p e r p e t u a t e s o c i o - e c o n o m i c d i v i d e s
f a r b e y o n d t h e s c h o o l s y s t e m , f o r e x a m p l e b y c r e a t i n g ‘ o l d b o y s c l u b s ’ .
• T h i s i s a ‘ w e m u s t p r e s e r v e t h e s e t r a d i t i o n s f o r o u r d a u g h t e r s ’ , r a t h e r t h a n a ‘ i t i s
t o o u r c h i l d r e n ’ a r g u m e n t .</p>
          <p>(b) Example of model output that was rated “Incoherent −”.
efort, an automated approach to argumentative reasoning tasks could reach coherence on par
with that of Kialo users.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Recently, large language models have come to dominate the field of natural language processing,
but arguably remain underexplored in the computational argumentation literature. In this paper,
we systematically evaluated the performance of a 2.7 billion parameter pretrained language
model across 6 argumentative reasoning tasks, using 5 diferent optimization strategies. With a
few exceptions, the rate at which the models produced coherent responses ranged from 15-50%,
compared to human performance of 65-82%. We share our finetuned models and code.</p>
      <p>To our knowledge the language model studied is larger than those previously considered in
the argumentation literature, but it has at least two orders of magnitude fewer parameters than
those that are state of the art on other NLP tasks, and the labeled data used for finetuning was
of dubious quality. Natural next steps would be to evaluate the performance of much larger
pretrained language models on the same argumentative reasoning tasks, and to invest in the
development of larger, high-quality labeled datasets of natural language argumentation to use
for finetuning.</p>
      <p>That said, language models fundamentally model statistical—rather than logical—relationships
between words, and it is not clear whether bigger models and better data alone will be suficient
to produce reliably coherent results. Thus, it would be valuable to explore how language
models could be combined with symbolic argumentation methods to improve the coherence of
generated arguments.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was funded by the Australian Department of Defence and the Ofice of National
Intelligence under the AI for Decision Making Program, delivered in partnership with the
Defence Science Institute in Victoria. The authors would like to thank other members of
the Hunt Lab, particularly Tim van Gelder and Ashley Barnett for helpful discussions, and
anonymous reviewers for their constructive feedback.
with Simple and Eficient Sparsity, Journal of Machine Learning Research 23 (2022) 1–39.
URL: http://jmlr.org/papers/v23/21-0998.html.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Engelbart</surname>
          </string-name>
          , Augmenting Human Intellect:
          <string-name>
            <given-names>A Conceptual</given-names>
            <surname>Framework</surname>
          </string-name>
          ,
          <source>Summary Report Project #3578</source>
          , Stanford Research Institute,
          <year>1962</year>
          . URL: https://apps.dtic.mil/sti/pdfs/ AD0289565.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leahy</surname>
          </string-name>
          , S. Biderman, GPT-Neo:
          <article-title>Large Scale Autoregressive Language Modeling with Mesh-Tensorflow,</article-title>
          <string-name>
            <surname>Zenodo</surname>
          </string-name>
          ,
          <year>2021</year>
          .
          <source>doi: 1 0 . 5 2 8 1 / z e n o d o . 5 2</source>
          <volume>9 7 7 1 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Branwen</surname>
          </string-name>
          , GPT-3
          <string-name>
            <given-names>Creative</given-names>
            <surname>Fiction</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://www.gwern.net/GPT-3.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Branwen</surname>
          </string-name>
          , GPT-3
          <string-name>
            <surname>Nonfiction</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://www.gwern.net/GPT-3
          <article-title>-nonfiction</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language Models are Few-Shot Learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.neurips.cc/paper/ 2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Guu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <surname>Finetuned Language Models Are Zero-Shot</surname>
            <given-names>Learners</given-names>
          </string-name>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 9 . 0 1 6 5 2 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>True Few-Shot Learning with Language Models</article-title>
          , in: M.
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Dauphin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          <string-name>
            <surname>Vaughan</surname>
          </string-name>
          (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>34</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2021</year>
          , pp.
          <fpage>11054</fpage>
          -
          <lpage>11070</lpage>
          . URL: https: //proceedings.neurips.cc/paper/2021/file/5c04925674920eb58467fb52ce4ef728-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          ,
          <article-title>True Few-Shot Learning with Prompts-A Real-World Perspective, Transactions of the Association for Computational Linguistics 10 (</article-title>
          <year>2022</year>
          )
          <fpage>716</fpage>
          -
          <lpage>731</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>6 2</volume>
          / t a c l _
          <source>a _ 0</source>
          <volume>0 4 8 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Logan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Balazevic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 6 . 1 3 3 5 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khabsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mao</surname>
          </string-name>
          , H. Ma,
          <article-title>Entailment as Few-Shot Learner</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 4 . 1 4 6 9 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>El Baf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Al Khatib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Computational Argumentation Synthesis as a Language Modeling Task</article-title>
          ,
          <source>in: Proceedings of the 12th International Conference on Natural Language Generation</source>
          , Association for Computational Linguistics, Tokyo, Japan,
          <year>2019</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>64</lpage>
          .
          <source>doi:1 0 . 1 8</source>
          <volume>6 5 3</volume>
          / v 1 / W 1 9
          <article-title>- 8 6 0 7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tafjord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <article-title>Transformers as Soft Reasoners over Language</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2020</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 0</source>
          <volume>0 2 . 0 5 8 6 7 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gurcke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alshomary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <article-title>Assessing the Suficiency of Arguments through Conclusion Generation</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>1 0 . 1 3 4 9 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Skitalinskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          , Learning From Revisions:
          <article-title>Quality Assessment of Claims in Argumentation at Scale, in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>1718</fpage>
          -
          <lpage>1729</lpage>
          . doi:
          <article-title>1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . e a c l - m a i n . 1 4 7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lalwani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vaidhya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sachan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          , Logical Fallacy Detection,
          <year>2022</year>
          .
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 2</source>
          <volume>0 2 . 1 3 7 5 8 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Ought</given-names>
            <surname>Inc</surname>
          </string-name>
          .,
          <string-name>
            <surname>Elicit</surname>
          </string-name>
          ,
          <year>2022</year>
          . URL: https://elicit.org/, accessed:
          <fpage>2022</fpage>
          -04-29.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>N.</given-names>
            <surname>Slonim</surname>
          </string-name>
          , Project Debator,
          <source>in: Computational Models of Argument: Proceedings of COMMA</source>
          <year>2018</year>
          ,
          <year>2018</year>
          , p.
          <fpage>4</fpage>
          .
          <source>doi:1 0 . 3 2</source>
          <volume>3 3 / 9 7 8 - 1 - 6 1 4 9 9 - 9 0 6 - 5 - 4</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Slonim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bilu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Alzate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bar-Haim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bogin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Choshen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cohen-Karlik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dankin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Edelstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ein-Dor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Friedman-Melamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gavron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gleize</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gretz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gutfreund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halfon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hershcovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoory</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hummel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jacovi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jochim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kantor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Konopnicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kotlerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krieger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lahav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liberman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Menczel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mirkin</surname>
          </string-name>
          , G. Moshkowich, S. OfekKoifman,
          <string-name>
            <given-names>M.</given-names>
            <surname>Orbach</surname>
          </string-name>
          , E. Rabinovich,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rinott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shechtman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sheinwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shnarch</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Shnayderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spector</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sznajder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Toledo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Toledo-Ronen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Venezian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aharonov</surname>
          </string-name>
          , An Autonomous Debating System,
          <source>Nature</source>
          <volume>591</volume>
          (
          <year>2021</year>
          )
          <fpage>379</fpage>
          -
          <lpage>384</lpage>
          .
          <source>doi:1 0 . 1 0 3 8 / s 4 1</source>
          <volume>5 8 6 - 0 2 1 - 0 3 2 1</volume>
          <fpage>5</fpage>
          -
          <lpage>w</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schuh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tsvyashchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Maynez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Prabhakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Reif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hutchinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Austin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Isard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gur-Ari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Duke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Levskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Michalewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fedus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ippolito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spiridonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sepassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Omernick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Pillai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pellat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lewkowycz</surname>
          </string-name>
          , E. Moreira,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Polozov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Saeta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Firat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Catasta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Meier-Hellstern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petrov</surname>
          </string-name>
          , N. Fiedel,
          <source>PaLM: Scaling Language Modeling with Pathways</source>
          ,
          <year>2022</year>
          .
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 2</source>
          <volume>0 4 . 0 2 3 1 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Sutskever</given-names>
            ,
            <surname>Improving Language Understanding by Generative</surname>
          </string-name>
          Pre-Training,
          <source>Technical Report, OpenAI</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Sutskever</given-names>
            ,
            <surname>Language Models Are Unsupervised Multitask Learners</surname>
          </string-name>
          ,
          <source>Technical Report, OpenAI</source>
          ,
          <year>2019</year>
          . URL: http://www. persagen.com/files/misc/radford2019language.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Golding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hoppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Phang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Thite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nabeshima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Presser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leahy</surname>
          </string-name>
          , The Pile:
          <article-title>An 800GB Dataset of Diverse Text for Language Modeling</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 1 . 0 0 0 2 7 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>Marriott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sbarski</surname>
          </string-name>
          , T. van
          <string-name>
            <surname>Gelder</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Prager</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bulka</surname>
          </string-name>
          ,
          <article-title>Hi-Trees and Their Layout</article-title>
          ,
          <source>IEEE Transactions on Visualization and Computer Graphics</source>
          <volume>17</volume>
          (
          <year>2011</year>
          )
          <fpage>290</fpage>
          -
          <lpage>304</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / T V C G .
          <volume>2 0 1 0 . 4</volume>
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>T. van Gelder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monk</surname>
          </string-name>
          , Argument Mapping Short Course,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sahitaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kallenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Coors</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dumani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schenkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <article-title>Towards an Argument Mining Pipeline Transforming Texts to Argument Graphs</article-title>
          ,
          <source>in: Computational Models of Argument: Proceedings of COMMA</source>
          <year>2020</year>
          ,
          <year>2020</year>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>270</lpage>
          .
          <source>doi:1 0 . 3 2 3 3 / F A I A 2 0</source>
          <volume>0 5 1 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Prompting: Better Ways of Using Language Models for NLP Tasks, The Gradient (</article-title>
          <year>2021</year>
          ). URL: https://thegradient.pub/prompting/.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          ,
          <article-title>How Many Data Points is a Prompt Worth?</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 3 . 0 8 4 9 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dolan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Carin</surname>
          </string-name>
          , W. Chen,
          <article-title>What Makes Good In-Context Examples for GPT-3?, CoRR (</article-title>
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 1 . 0 6 8 0 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bartolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stenetorp</surname>
          </string-name>
          , Fantastically Ordered Prompts and
          <article-title>Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 4 . 0 8 7 8 6 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Klein,
          <article-title>Meta-tuning Language Models to Answer Prompts Better</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 4 . 0 4 6 7 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <article-title>The Power of Scale for Parameter-Eficient Prompt Tuning</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 4 . 0 8 6 9 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , GPT Understands, Too,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 3 . 1 0 3 8 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>G.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eisner</surname>
          </string-name>
          , Learning How to Ask:
          <article-title>Querying LMs with Mixtures of Soft Prompts</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 4 . 0 6 5 9 9 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>E. B.</given-names>
            <surname>Zaken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravfogel</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Goldberg,</surname>
          </string-name>
          <article-title>BitFit: Simple Parameter-eficient Fine-tuning for Transformer-based Masked Language-models</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ).
          <source>doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1</source>
          <volume>0 6 . 1 0 1 9 9 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          , S. Chintala,
          <article-title>PyTorch: An Imperative Style, High-Performance Deep Learning Library</article-title>
          , in: H.
          <string-name>
            <surname>Wallach</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>dAlché-Buc</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Fox</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2019</year>
          , pp.
          <fpage>8024</fpage>
          -
          <lpage>8035</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art Natural Language Processing</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>K.</given-names>
            <surname>Parker</surname>
          </string-name>
          , s o f t
          <article-title>- p r o m p t - t u n i n g , 2021</article-title>
          . URL: https://github.com/kipgparker/ soft-prompt-tuning, accessed:
          <fpage>2021</fpage>
          -11-20.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rasley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajbhandari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ruwase</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. He,</surname>
          </string-name>
          <article-title>DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters</article-title>
          ,
          <source>in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          , Association for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , pp.
          <fpage>3505</fpage>
          -
          <lpage>3506</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>W.</given-names>
            <surname>Fedus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          , Switch Transformers: Scaling to Trillion Parameter Models
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>