<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiangci Li</string-name>
          <email>lixiangci8@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gully Burns</string-name>
          <email>gully.burns@chanzuckerburg.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nanyun Peng</string-name>
          <email>violetpeng@cs.ucla.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chan Zuckerburg Initiative</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of California Los Angeles</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Texas at Dallas</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Work performed at Information Sciences Institute, Viterbi School of Engineering, University of Southern California Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International</institution>
          ,
          <addr-line>CC BY 4.0</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Even for domain experts, it is a non-trivial task to verify a scientific claim by providing supporting or refuting evidence rationales. The situation worsens as misinformation is proliferated on social media or news websites, manually or programmatically, at every moment. As a result, an automatic factverification tool becomes crucial for combating the spread of misinformation. In this work, we propose a novel, paragraphlevel, multi-task learning model for the SCIFACT task by directly computing a sequence of contextualized sentence embeddings from a BERT model and jointly training the model on rationale selection and stance prediction.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Many seemingly convincing rumors such as “Most humans
only use 10 percent of their brain” are widely spread, but
ordinary people are not able to rigorously verify them by
searching for scientific literature. In fact, it is not a trivial
task to verify a scientific claim by providing supporting or
refuting evidence rationales, even for domain experts. The
situation worsens as misinformation is proliferated on
social media or news websites, manually or programmatically,
at every moment. As a result, an automatic fact-verification
tool becomes more and more crucial for combating the
spread of misinformation.</p>
      <p>
        The existing fact-verification tasks usually consist of three
sub-tasks: document retrieval, rationale sentence extraction,
and fact-verification. However, due to the nature of scientific
literature that requires domain knowledge, it is challenging
to collect a large scale scientific fact-verification dataset, and
further, to perform fact-verification under a low-resource
setting with limited training data.
        <xref ref-type="bibr" rid="ref33">Wadden et al. (2020)</xref>
        collected a scientific claim-verification dataset, SCIFACT, and
proposed a scientific claim-verification task: given a
scientific claim, find evidence sentences that support or refute the
claim in a corpus of scientific paper abstracts.
        <xref ref-type="bibr" rid="ref33">Wadden et al.
(2020)</xref>
        also proposed a simple, pipeline-based,
sentencelevel model, VERISCI, as a baseline solution based on
        <xref ref-type="bibr" rid="ref12">DeYoung et al. (2019)</xref>
        .
      </p>
      <p>VERISCI is a pipeline model that runs modules for
abstract retrieval, rationale sentence selection, and stance
prediction sequentially, and thus the error generated from an
upstream module may propagate to the downstream
modules. To overcome this drawback, we hypothesize that a
module jointly optimized on multiple sub-tasks may
mitigate the error-propagation problem to improve the overall
performance. In addition, we observe that a complete set
of rationale sentences usually contains multiple inter-related
sentences from the same paragraph. Therefore, we propose
a novel, paragraph-level, multi-task learning model for the</p>
      <sec id="sec-1-1">
        <title>SCIFACT task.</title>
        <p>
          In this work, we employ compact paragraph encoding, a
novel strategy of computing sentence representations using
BERT-family models. We directly feed an entire paragraph
as a single sequence to BERT, so that the encoded sentence
representations are already contextualized on the neighbor
sentences by taking advantage of the attention mechanisms
in BERT. In addition, we jointly train the modules for
rationale selection and stance prediction as multi-task learning
          <xref ref-type="bibr" rid="ref7">(Caruana 1997)</xref>
          by leveraging the confidence score of
rationale selection as the attention weight of the stance prediction
module. Furthermore, we compare two methods of transfer
learning that mitigate the low-resource issue: pre-training
and domain adaptation
          <xref ref-type="bibr" rid="ref10 ref22">(Peng and Dredze 2017)</xref>
          . Our
experiments show that:
• The compact paragraph encoding method is beneficial
over separately computing sentence embeddings.
• With negative sampling, the joint training of rationale
selection and stance prediction is beneficial over the pipeline
solution.
        </p>
        <p>2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>SCIFACT Task Formulation</title>
      <p>
        Given a scientific claim c and a corpus of scientific
paper abstracts A, the SCIFACT
        <xref ref-type="bibr" rid="ref33">(Wadden et al. 2020)</xref>
        task retrieves all abstracts E^(c) that either SUPPORTS
or REFUTES c. Specifically, the stance prediction (a.k.a.
label prediction) task classifies each abstract a ∈ A
into y(c; a) ∈ {SUPPORT; REFUTES; NOINFO} with
respect to each claim c; the rationale selection (a.k.a.
sentence selection) task retrieves all rationale sentences
S^(c; a) = {s^1(c; a); :::; s^l(c; a)} of each a that
SUP
      </p>
      <sec id="sec-2-1">
        <title>PORTS or REFUTES c. The performance of both tasks</title>
        <p>
          are evaluated with F 1 measure at both abstract-level and
sentence-level, as defined by
          <xref ref-type="bibr" rid="ref33">Wadden et al. (2020)</xref>
          , where
{SUPPORTS; REFUTES} are considered as the positive
labels and NOINFO is the negative label for stance prediction.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>
        We formulate the SCIFACT task
        <xref ref-type="bibr" rid="ref33">(Wadden et al. 2020)</xref>
        as a
sentence-level sequence-tagging problem. We first apply an
abstract retrieval module to filter out negative candidate
abstracts that do not contain sufficient information with respect
to each given claim. Then we propose a novel model for
joint rationale selection and stance prediction using
multitask learning
        <xref ref-type="bibr" rid="ref7">(Caruana 1997)</xref>
        .
3.1
      </p>
      <sec id="sec-3-1">
        <title>In contrast to the TF-IDF similarity used by Wadden et al.</title>
        <p>
          (2020), we leverage BioSentVec
          <xref ref-type="bibr" rid="ref20 ref38 ref8">(Chen, Peng, and Lu 2019)</xref>
          embedding, which is the biomedical version of Sent2Vec
          <xref ref-type="bibr" rid="ref21 ref36 ref37">(Pagliardini, Gupta, and Jaggi 2018)</xref>
          , for a fast and scalable
sentence-level similarity computation. We first compute the
BioSentVec
          <xref ref-type="bibr" rid="ref20 ref38 ref8">(Chen, Peng, and Lu 2019)</xref>
          embedding of each
abstract in the corpus by treating the concatenation of each
title and abstract as a single sentence. Then for each given
claim, we compute the cosine similarities of the claim
embedding against the pre-computed abstract embeddings, and
choose the top kretrieval similar abstracts as the candidate
abstracts for the next module.
3.2
        </p>
        <sec id="sec-3-1-1">
          <title>Joint Rationale Selection and Stance</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Prediction Model</title>
          <p>
            Compact Paragraph Encoding A major usage of
BERTfamily models
            <xref ref-type="bibr" rid="ref11 ref17">(Devlin et al. 2018; Liu et al. 2019)</xref>
            for
sentence-level sequence tagging computes each sentence
embedding in a paragraph with batches. Since each batch is
independent, such method leaves the contextualization of the
sentences to the subsequent modules. Instead, we propose
a novel method of encoding paragraphs by directly feeding
the concatenation of the claim c and the whole paragraph
P to a BERT model BERT as a single sequence Seq. By
separating each sentence s using the BERT model’s [SEP ]
token, we fully leverage the multi-head attention
            <xref ref-type="bibr" rid="ref28">(Vaswani
et al. 2017)</xref>
            within the BERT model to compute the
contextualized word representations hSeq with respect to the claim
sentence and the whole paragraph.
          </p>
          <p>c = [cw1; cw2; : : : ; cwn]
si = [w1; w2; : : : ; wm]</p>
          <p>P = [s1; s2; : : : ; sl]
Seq = [c[SEP ]s1[SEP ]s2[SEP ] : : : [SEP ]sl]
(1)
hSeq = BERT (Seq) ∈ Rlen(Seq)×dBERT
hSeq = [hCLS; hcw1; : : : ; hcwn;</p>
          <p>hSEP ; hw1; : : : ; hwm; hSEP ; : : :]</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Sentence Representations via Word-level Attention</title>
          <p>Next, we apply a weighted sum to the contextualized word
(2)
(3)
representations of each sentence hsent to compute the
sentence representations hsi . The weights are obtained by
applying a self-attention Self Attnword with a two-layer
multilayer perceptron on the word representations in the scope of
each sentence, as separated by the [SEP ] tokens.
d
hsi = Self Attnword([hSEP ; hw1; :::; hwm]) ∈ R BERT</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>Dynamic Rationale Representations We use a two-layer</title>
          <p>multi-layer perceptron M LPrationale to compute the
rationale score and use the sof tmax function to compute the
probability of each candidate sentence being a rationale
sentence pr or not pnot r with respect to the claim sentence c.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Then we only feed rationale sentences r into the next stance</title>
        <p>prediction module.</p>
        <p>pinot r; pir = sof tmax(M LPrationale(hsi)) ∈ (0; 1)
hri ← hsi if pinot r &lt; pir</p>
      </sec>
      <sec id="sec-3-3">
        <title>Stance Prediction We use two variants for stance predic</title>
        <p>
          tion: a simple sentence-level attention and the Kernel Graph
Attention Network (KGAT)
          <xref ref-type="bibr" rid="ref18">(Liu et al. 2020)</xref>
          .
• Simple Attention. We apply another weighted
summation on the predicted rationale sentence representations
hri to compute the whole paragraph’s rationale
representation, where the attention weights are obtained by
applying another self-attention Self Attnsentence on the
rationale sentence representations hr. Finally, we apply
another two-layer multi-layer perceptron M LPstance and
the sof tmax function to compute the probability of the
paragraph serving the role of fSUPPORTS, REFUTES,
NOINFOg with respect to the claim c.
d
hr = Self Attnsentence([hr1; hr2; :::; hrl]) ∈ R BERT
pstance = sof tmax(M LPstance(hr)) ∈ (0; 1)3
(4)
• Kernel Graph Attention Network.
          <xref ref-type="bibr" rid="ref18">Liu et al. (2020)</xref>
          proposed KGAT as a stance prediction module for their
pipeline solution on the FEVER (Thorne et al. 2018) task.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>In addition to the Graph Attention Network (Velicˇkovic´</title>
        <p>
          et al. 2017), which applies attention mechanisms on each
word pair and sentence pair in the input paragraph, KGAT
applies a kernel pooling mechanism
          <xref ref-type="bibr" rid="ref35">(Xiong et al. 2017)</xref>
          to
extract better features for stance prediction. We integrate
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>KGAT (Liu et al. 2020) into our multi-task learning model</title>
        <p>
          for stance prediction on SCIFACT
          <xref ref-type="bibr" rid="ref33">(Wadden et al. 2020)</xref>
          .
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>The KGAT module KGAT takes the word representation</title>
        <p>of the claim hc and the predicted rationale sentence
representations hR as inputs, and outputs the probability of
the paragraph serving the role of fSUPPORTS, REFUTES,
NOINFOg with respect to the claim c.</p>
        <p>hc = [hCLS; hcw1; : : : ; hcwn]
hRi = [hSEP ; hrw1; : : : ; hrwm] where pinot r &lt; pir
hR = [hR1; hR2; : : : ; hRl]
pstance = KGAT (hc; hR) ∈ (0; 1)3
(5)</p>
        <sec id="sec-3-6-1">
          <title>Multi-task Learning We train our model on rationale se</title>
          <p>
            lection and stance prediction using multi-task learning
approach
            <xref ref-type="bibr" rid="ref7">(Caruana 1997)</xref>
            . We use cross-entropy loss as the
training objective for both tasks. We introduce a coefficient
to adjust the proportion of two loss values Lrationale and
          </p>
        </sec>
      </sec>
      <sec id="sec-3-7">
        <title>Lstance in the joint loss L.</title>
        <p>L =</p>
        <p>
          Lrationale + Lstance
Scheduled Sampling Because the stance prediction
module takes the predicted rationale sentences as the input,
errors in rationale selection may propagate to the stance
prediction module, especially during the early stage of
training. To mitigate this issue, we apply scheduled sampling
          <xref ref-type="bibr" rid="ref6">(Bengio et al. 2015)</xref>
          , which starts by feeding the ground
truth rationale sentences to the stance prediction module,
and gradually increasing the proportion of the predicted
rationale sentences, until eventually all input sentences are
the predicted rationale sentences. We use a sin function to
compute the probability of sampling predicted rationale
sentences psample as a function of the progress of the training:
progress =
current epoch − 1
        </p>
        <p>total epoch − 1
psample = sin( 2 × progress)</p>
        <sec id="sec-3-7-1">
          <title>Negative Sampling and Down-sampling Although the</title>
          <p>abstract retrieval module filters out the majority of the
negative candidate abstracts, the false-positive rate is still
inevitably high, in order to ensure the retrieval of most of the
positive abstracts. As a result, the input to the joint
prediction model is highly biased towards negative samples.</p>
        </sec>
      </sec>
      <sec id="sec-3-8">
        <title>Therefore, in addition to the positive samples from the SCI</title>
      </sec>
      <sec id="sec-3-9">
        <title>FACT dataset (Wadden et al. 2020), we perform negative</title>
        <p>sampling (Mikolov et al. 2013) to sample the top ktrain
similar negative abstracts using our abstract retrieval module as
an augmented dataset for training and validation to increase
the downstream model’s tolerance to false positive abstracts.</p>
      </sec>
      <sec id="sec-3-10">
        <title>Furthermore, in order to increase the diversity of the dataset, we augment the dataset by down-sampling sentences within each paragraph.</title>
      </sec>
      <sec id="sec-3-11">
        <title>FEVER Pre-training As Wadden et al. (2020) proposed,</title>
        <p>
          due to the similar task structure of FEVER (Thorne et al.
2018) and SCIFACT
          <xref ref-type="bibr" rid="ref33">(Wadden et al. 2020)</xref>
          , we first pre-train
our model on the FEVER dataset, then fine-tune on the
SCI
        </p>
      </sec>
      <sec id="sec-3-12">
        <title>FACT dataset by partially re-initializing the rationale selec</title>
        <p>tion and stance prediction attention modules.</p>
      </sec>
      <sec id="sec-3-13">
        <title>Domain Adaptation Instead of pre-training, we also ex</title>
        <p>
          plore domain adaptation
          <xref ref-type="bibr" rid="ref10 ref22">(Peng and Dredze 2017)</xref>
          from
FEVER (Thorne et al. 2018) to SCIFACT
          <xref ref-type="bibr" rid="ref33">(Wadden et al.
2020)</xref>
          . We use shared representations for the compact
paragraph encoding and word-level attention, while using
domain-specific representations for the rationale selection
and stance prediction modules.
(6)
(7)
        </p>
      </sec>
      <sec id="sec-3-14">
        <title>Parameter</title>
        <p>kretrieval
kF EV ER
ktrain
drop out
learning rate</p>
      </sec>
      <sec id="sec-3-15">
        <title>BERT learning rate batch size Explored</title>
        <p>Dummy Rationale Sentence. We dynamically feed only
the predicted rationale sentence representations to the stance
prediction module. To address the special case when an
abstract contains no rationale sentences, we append a fixed
dummy sentence (e.g.“@”) whose rationale label is always 0
at the beginning of each of the paragraph. When the stance
prediction module has no actual rationale sentence to take
as input, we feed it with the representation of the dummy
sentence and expect the module to predict NOINFO.</p>
      </sec>
      <sec id="sec-3-16">
        <title>Post Processing. To prevent inconsistency between the</title>
        <p>outputs of rationale selection and stance prediction, we
enforce the predicted stance to be NOINFO if no rationale
sentence is proposed.</p>
      </sec>
      <sec id="sec-3-17">
        <title>Hyper-parameters. Table 1 lists the hyper-parameters</title>
        <p>used for training the Joint-Paragraph model in Table 4 1,
where kF EV ER refers to the number of negative samples
retrieved from FEVER (Thorne et al. 2018) for model
pretraining.</p>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>4.1</p>
      <sec id="sec-4-1">
        <title>SCIFACT Dataset</title>
        <sec id="sec-4-1-1">
          <title>SCIFACT (Wadden et al. 2020) is a small dataset, whose corpus contains 5183 abstracts. There are 1409 claims, including 809 in the training set, 300 in the development set and 300 in the test set.</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>1https://github.com/jacklxc/ParagraphJointModel</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>Paragraph-Pipeline</title>
        </sec>
        <sec id="sec-4-1-4">
          <title>Paragraph-Joint</title>
        </sec>
        <sec id="sec-4-1-5">
          <title>Paragraph-Joint KGAT</title>
        </sec>
        <sec id="sec-4-1-6">
          <title>VERT5ERINI* P</title>
        </sec>
        <sec id="sec-4-1-7">
          <title>Sentence-level</title>
        </sec>
        <sec id="sec-4-1-8">
          <title>Selection-Only Selection+Label</title>
          <p>R F1 P R
56:3 65:3 69:8 50:5
74:9
75:9
75:5
83:5
67:7
62:8
68:3
72:1
81:6
83:3
81:8
92:7
72:2
76:6
75:1
79:0</p>
          <p>
            F1
50:0
71:2
74:2
69:3
72:5
69:4
68:8
70:2
67:7
64:8
51:4
57:4
50:0
55:7
56:6
56:6
55:5
53:8
57:4
59:7
64.7
58:1
63:1
62:3
62:1
62:0
60:0
60:9
62:1
63:3
59:8
62:2
60:4
60:1
61:9
63:9
60:8
44:8
48:9
43:2
47:8
49:2
49:5
48:9
50:8
53:8
52:1
55.2
50:2
54:1
54:2
54:3
54:7
56:6
57:1
54:5
59:8
52:1
59:8
57:4
62:7
59:3
61:7
65:1
64:0
65:1
59:7
64:7
62:0
65.3
64:6
66:0
65:1
72:8
65:7
64:7
65:5
63:5
61:5
66:3
67:0
61:7
51:2
55:0
48:3
55:5
54:1
56:5
55:5
58:4
61:7
60:1
59:9
55:3
60:1
58:4
58:9
60.4
62:4
61:7
P
56:4
77:6
71:4
69:9
70:6
67:4
68:2
70:9
70:9
65:1
Table 2 compares the performance of abstract retrieval
modules using using TF-IDF and BioSentVec
            <xref ref-type="bibr" rid="ref20 ref38 ref8">(Chen, Peng, and
Lu 2019)</xref>
            . As Table 2 indicates, the overall difference
between these two methods is small.
            <xref ref-type="bibr" rid="ref33">Wadden et al. (2020)</xref>
            chose kretrieval = 3 to maximize the F1 score of the abstract
retrieval module, while we choose a larger kretrieval to
pursue a larger recall score, in order to retrieve more positive
abstracts for the down-stream models.
4.3
          </p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Baseline Models</title>
        <sec id="sec-4-2-1">
          <title>VERISCI. Along with the SCIFACT task and dataset,</title>
          <p>
            <xref ref-type="bibr" rid="ref33">Wadden et al. (2020)</xref>
            proposed VERISCI, a sentence-level,
pipeline-based solution. After retrieving the top similar
abstracts for each claim with TF-IDF vectorization method,
they applied a sentence-level “BERT to BERT” model
            <xref ref-type="bibr" rid="ref12">DeYoung et al. (2019)</xref>
            to extract rationales, sentence by sentence,
with a BERT model, and they predict the stance with
another BERT model using the concatenation of the extracted
rationale sentences.
            <xref ref-type="bibr" rid="ref33">Wadden et al. (2020)</xref>
            used Roberta-large
            <xref ref-type="bibr" rid="ref17">(Liu et al. 2019)</xref>
            as their BERT model and pre-trained their
stance prediction module on the FEVER dataset (Thorne
et al. 2018).
          </p>
        </sec>
        <sec id="sec-4-2-2">
          <title>VERT5ERINI. Very recently, Pradeep et al. (2020) pro</title>
          <p>
            posed a strong model VERT5ERINI, based on T5
            <xref ref-type="bibr" rid="ref26">(Raffel
et al. 2019)</xref>
            . They applied T5 for all three steps of the
SCI
          </p>
        </sec>
        <sec id="sec-4-2-3">
          <title>FACT task in a sentence-level, pipeline fashion. Because of</title>
          <p>
            the known significant performance gap between
Robertalarge
            <xref ref-type="bibr" rid="ref17">(Liu et al. 2019)</xref>
            that we use and T5 (Raffel et al. 2019;
          </p>
        </sec>
        <sec id="sec-4-2-4">
          <title>Pradeep et al. 2020), we only use VERT5ERINI as a reference (marked with *).</title>
          <p>4.4</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Model Performances and Ablation Studies</title>
        <sec id="sec-4-3-1">
          <title>We experiment on the oracle task, which performs rationale</title>
          <p>selection and stance prediction given the oracle abstracts
(Table 3), and the open task, which performs the full task
of abstract retrieval, rationale selection, and stance
prediction (Table 4). We tune our models based on the
sentencelevel, final development set performance (Selection+Label).</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>The test labels are not released by Wadden et al. (2020). Unless explicitly stated, all models are pre-trained on FEVER (Thorne et al. 2018).</title>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>Paragraph-level Model vs. Sentence-level Model. We</title>
        <p>
          compare our paragraph-level pipeline model against
VERISCI
          <xref ref-type="bibr" rid="ref33">(Wadden et al. 2020)</xref>
          , which is a
sentencelevel solution on the oracle task. As Table 3 shows, our
paragraph-level pipeline model (Paragraph-Pipeline)
outperforms VERISCI, particularly on rationale selection. This
suggests the benefit of computing the contextualized
sentence representations using the compact paragraph
encoding over individual sentence representations.
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>Joint Model vs. Pipeline Model. Although our joint</title>
        <p>model does not show benefits over the pipeline model
on the oracle task (Table 3), the benefit emerges on the
open task. Along with negative sampling, which greatly
increases the tolerance of models to false positive
abstracts, the Paragraph-Joint model shows its benefit over the</p>
        <sec id="sec-4-5-1">
          <title>Paragraph-Pipeline model. The small difference between the</title>
        </sec>
        <sec id="sec-4-5-2">
          <title>Paragraph-Joint model and the same model except with TF</title>
        </sec>
        <sec id="sec-4-5-3">
          <title>IDF abstract retrieval (Paragraph-Joint TF-IDF) shows that</title>
          <p>the performance improvement is mainly attributed to the
joint training, instead of replacing TF-IDF similarity with</p>
        </sec>
        <sec id="sec-4-5-4">
          <title>BioSentVec embedding similarity in abstract retrieval.</title>
          <p>
            Pre-training vs. Domain Adaptation. We also compare
two methods of transfer learning from FEVER (Thorne et al.
2018) to SCIFACT
            <xref ref-type="bibr" rid="ref33">(Wadden et al. 2020)</xref>
            . Table 4 shows that
the effect of pre-training (Paragraph-Joint) or domain
adaptation
            <xref ref-type="bibr" rid="ref10 ref22">(Peng and Dredze 2017)</xref>
            (Paragraph-Joint DA) is
similar. Both of them are effective as transfer learning, as they
significantly outperform the same model that is only trained
on SCIFACT (Paragraph-Joint SCIFACT-only).
          </p>
        </sec>
      </sec>
      <sec id="sec-4-6">
        <title>KGAT vs. Simple Attention as Stance Prediction Module.</title>
        <sec id="sec-4-6-1">
          <title>We expected a significant performance improvement by ap</title>
          <p>
            plying the strong stance prediction model KGAT
            <xref ref-type="bibr" rid="ref18">(Liu et al.
2020)</xref>
            , but the actual improvement is limited. This is likely
due to the strong regularization of KGAT that under-fits the
training data.
          </p>
          <p>Test-set Performance on the SCIFACT Leaderboard By
the time this paper is updated, our Paragraph-Joint model
trained on the combination of SCIFACT training set and
development set achieved the first place on the SCIFACT
leaderboard 2. We obtain test sentence-level F1 score
(Selection+Label) of 60:9% and test abstract-level F1 score
(Label+Rationale) of 67:2%.</p>
          <p>5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>
        Fact-verification has been widely studied. There are many
datasets available on various domains
        <xref ref-type="bibr" rid="ref1 ref13 ref20 ref23 ref23 ref32 ref34 ref5 ref8">(Vlachos and Riedel
2014; Ferreira and Vlachos 2016; Popat et al. 2017; Wang
2017; Derczynski et al. 2017; Popat et al. 2017; Atanasova
2018; Baly et al. 2018; Chen et al. 2019; Hanselowski et al.
2019)</xref>
        , among which the most influential one is FEVER
shared task (Thorne et al. 2018), which aims to develop
systems to check the veracity of human-generated claims
by extracting evidences from Wikipedia. Most existing
systems
        <xref ref-type="bibr" rid="ref20 ref38 ref8">(Nie, Chen, and Bansal 2019)</xref>
        leverages a three-step
pipeline approach by building modules for each of the step:
document retrieval, rationale selection and fact verification.
Many of them focus on the claim verification step
        <xref ref-type="bibr" rid="ref18">(Zhou
et al. 2019; Liu et al. 2020)</xref>
        , such as KGAT
        <xref ref-type="bibr" rid="ref18">(Liu et al. 2020)</xref>
        ,
one of the top models on FEVER leader board. On the other
hand, there are some attempts on jointly optimizing rationale
selection and stance prediction. TwoWingOS
        <xref ref-type="bibr" rid="ref36 ref37">(Yin and Roth
2018)</xref>
        leverages attentive CNN
        <xref ref-type="bibr" rid="ref36 ref37">(Yin and Schu¨tze 2018)</xref>
        to
      </p>
      <sec id="sec-5-1">
        <title>2https://leaderboard.allenai.org/scifact/submissions/public, as</title>
        <p>
          of February 12, 2021.
inter-wire two modules, while
          <xref ref-type="bibr" rid="ref15">Hidey et al. (2020)</xref>
          used a
single pointer network
          <xref ref-type="bibr" rid="ref30 ref6">(Vinyals, Fortunato, and Jaitly 2015)</xref>
          for
both sub-tasks. We propose another variation that directly
links two modules by a dynamic attention mechanism.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Because SCIFACT (Wadden et al. 2020) is a scientific ver</title>
        <p>
          sion of FEVER (Thorne et al. 2018), systems designed for
FEVER can be applied to SCIFACT in principle. However,
as a fact-verification task in scientific domain, SCIFACT task
has inherited the common issue of lacking sufficient data,
which can be mitigated with transfer learning by
leveraging language models and introducing external dataset. The
baseline model by
          <xref ref-type="bibr" rid="ref33">Wadden et al. (2020)</xref>
          leverages
Robertalarge
          <xref ref-type="bibr" rid="ref17">(Liu et al. 2019)</xref>
          fine-tuned on FEVER dataset (Thorne
et al. 2018), while VERT5ERINI
          <xref ref-type="bibr" rid="ref25">(Pradeep et al. 2020)</xref>
          leverages T5
          <xref ref-type="bibr" rid="ref26">(Raffel et al. 2019)</xref>
          and fine-tuned on MS MARCO
dataset
          <xref ref-type="bibr" rid="ref3">(Bajaj et al. 2016)</xref>
          . In this work, in addition to
finetuning Roberta-large on FEVER, we also explore domain
adaptation
          <xref ref-type="bibr" rid="ref10 ref22">(Peng and Dredze 2017)</xref>
          to mitigate the low
resource issue.
        </p>
        <p>6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <sec id="sec-6-1">
        <title>In this work, we propose a novel paragraph-level multi-task</title>
        <p>learning model for SCIFACT task. Experiments show that
(1) The compact paragraph encoding method is beneficial
over separately computing sentence embeddings. (2) With
negative sampling, the joint training of rationale selection
and stance prediction is beneficial over the pipeline solution.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <sec id="sec-7-1">
        <title>We thank the anonymous reviewers for their useful com</title>
        <p>ments, and Dr. Jessica Ouyang for her feedback. This work
is supported by a National Institutes of Health (NIH) R01
grant (LM012592). The views and conclusions of this paper
are those of the authors and do not reflect the official policy
or position of NIH.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Atanasova</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Lluıs Marquez, Alberto Barro´ n-</article-title>
          <string-name>
            <surname>Cedeno</surname>
          </string-name>
          , Tamer Elsayed, Reem Suwaileh, Wajdi Zaghouani, Spas Kyuchukov, Giovanni Da San Martino, and Preslav Nakov.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2018.
          <article-title>Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims, Task 1: Checkworthiness</article-title>
          . In Working Notes of the Conference and
          <article-title>Labs of the Evaluation Forum, CLEF</article-title>
          , volume
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bajaj</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Campos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Craswell</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Liu,
          <string-name>
            <given-names>X.</given-names>
            ;
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ;
            <surname>McNamara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          ; et al.
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>Ms marco: A human generated machine reading comprehension dataset</article-title>
          .
          <source>arXiv preprint arXiv:1611</source>
          .
          <fpage>09268</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Baly</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Mohtarami,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Glass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ; Ma`rquez, L.;
            <surname>Moschitti</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          ; and Nakov,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Integrating stance detection and fact checking in a unified corpus</article-title>
          . arXiv preprint arXiv:
          <year>1804</year>
          .08012 .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jaitly</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Scheduled sampling for sequence prediction with recurrent neural networks</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>1171</volume>
          -
          <fpage>1179</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Caruana</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>1997</year>
          .
          <article-title>Multitask learning</article-title>
          .
          <source>Machine learning 28(1)</source>
          :
          <fpage>41</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>BioSentVec: creating sentence embeddings for biomedical texts</article-title>
          .
          <source>In 2019 IEEE International Conference on Healthcare Informatics (ICHI)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          2019.
          <article-title>Seeing things from a different angle: Discovering diverse perspectives about claims</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .03538 .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>W. S.</given-names>
            ; and
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2017</year>
          . SemEval
          <article-title>-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours</article-title>
          .
          <source>arXiv preprint arXiv:1704</source>
          .
          <fpage>05972</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Chang, M.-W.;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .04805 .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>DeYoung</surname>
            , J.; Jain,
            <given-names>S.</given-names>
          </string-name>
          ; Rajani,
          <string-name>
            <given-names>N. F.</given-names>
            ;
            <surname>Lehman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ;
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Socher</surname>
          </string-name>
          , R.; and
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Eraser: A benchmark to evaluate rationalized nlp models</article-title>
          . arXiv preprint arXiv:
          <year>1911</year>
          .03429 .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Ferreira</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Vlachos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Emergent: a novel data-set for stance classification</article-title>
          .
          <source>In Proceedings of the 2016</source>
          conference
          <article-title>of the North American chapter of the association for computational linguistics: Human language technologies</article-title>
          ,
          <fpage>1163</fpage>
          -
          <lpage>1168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          2019.
          <article-title>A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking</article-title>
          . arXiv preprint arXiv:
          <year>1911</year>
          .01214 .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Hidey</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chakrabarty</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Alhindi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Varia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Krstovski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Muresan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          arXiv preprint arXiv:
          <year>2004</year>
          .12864 .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Joshi,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ;
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ; and
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .11692 .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and Liu,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>Fine-grained fact verification with kernel graph attention network</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <fpage>7342</fpage>
          -
          <lpage>7351</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          2013.
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>3111</volume>
          -
          <fpage>3119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            , H.; and Bansal,
            <given-names>M.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Combining fact extraction and verification with neural semantic matching networks</article-title>
          .
          <source>In Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>33</volume>
          ,
          <fpage>6859</fpage>
          -
          <lpage>6866</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Pagliardini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; and Jaggi,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features</article-title>
          . In NAACL 2018 -
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Peng</surname>
            , N.; and Dredze,
            <given-names>M.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Multi-task multi-domain representation learning for sequence tagging</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Representation Learning for NLP.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Popat</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mukherjee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Stro¨ tgen, J.; and Weikum,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <article-title>Where the truth lies: Explaining the credibility of emerging claims on the web and social media</article-title>
          .
          <source>In Proceedings of the 26th International Conference on World Wide Web Companion</source>
          ,
          <fpage>1003</fpage>
          -
          <lpage>1012</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Pradeep</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Ma,
          <string-name>
            <given-names>X.</given-names>
            ;
            <surname>Nogueira</surname>
          </string-name>
          , R.; and
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Scientific Claim Verification with VERT5ERINI</article-title>
          . arXiv preprint arXiv:
          <year>2010</year>
          .11930 .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Raffel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Narang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Matena,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ;
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          ; and Liu,
          <string-name>
            <surname>P. J.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .10683 .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          2018.
          <article-title>FEVER: a large-scale dataset for fact extraction and verification</article-title>
          . arXiv preprint arXiv:
          <year>1803</year>
          .05355 .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A. N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kaiser</surname>
          </string-name>
          , Ł.; and
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Attention is all you need</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>5998</volume>
          -
          <fpage>6008</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Velicˇkovic´</surname>
            , P.; Cucurull,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Casanova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; and Bengio,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Graph attention networks</article-title>
          .
          <source>arXiv preprint arXiv:1710</source>
          .
          <fpage>10903</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fortunato</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Jaitly</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Pointer networks</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>2692</volume>
          -
          <fpage>2700</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Vlachos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Fact checking: Task definition and dataset construction</article-title>
          .
          <source>In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science</source>
          ,
          <volume>18</volume>
          -
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Wadden</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L. L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lin</surname>
            , S.; van Zuylen,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cohan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and Hajishirzi,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>Fact or Fiction: Verifying Scientific Claims</article-title>
          .
          <source>In EMNLP.</source>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W. Y.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>” liar, liar pants on fire”: A new benchmark dataset for fake news detection</article-title>
          .
          <source>arXiv preprint arXiv:1705</source>
          .
          <fpage>00648</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Liu,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          ; and Power,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Endto-end neural ad-hoc ranking with kernel pooling</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval</source>
          ,
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Twowingos: A two-wing optimization strategy for evidential claim verification</article-title>
          . arXiv preprint arXiv:
          <year>1808</year>
          .03465 .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and Schu¨ tze,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Attentive convolution: Equipping cnns with rnn-style attention mechanisms</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>6</volume>
          :
          <fpage>687</fpage>
          -
          <lpage>702</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          2019.
          <article-title>GEAR: Graph-based evidence aggregating and reasoning for fact verification</article-title>
          . arXiv preprint arXiv:
          <year>1908</year>
          .
          <year>01843</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>