<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Crowdsourcing Tasks for Accurate Misinformation Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ronald Denaux</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Flavio Merenda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose Manuel Gom</string-name>
          <email>jmgomezg@expertsystem.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Expert System</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>For all the recent advancements in Natural Language Processing and deep learning, current systems for misinformation detection are still woefully inaccurate in real-world data. Automated misinformation detection systems |available to the general public and producing explainable ratings| are therefore still an open problem and involvement of domain experts, journalists or fact-checkers is necessary to correct the mistakes such systems currently make. Reliance on such expert feedback imposes a bottleneck and prevents scalability of current approaches. In this paper, we propose a method |based on a recent semantic-based approach for misinformation detection, Credibility Reviews (CR)|, to (i) identify real-world errors of the automatic analysis; (ii) use the semantic links in the CR graphs to identify steps in the misinformation analysis which may have caused the errors and (iii) derive crowdsourcing tasks to pinpoint the source of errors. As a bonus, our approach generates real-world training samples which can improve existing datasets and the accuracy of the overall system.</p>
      </abstract>
      <kwd-group>
        <kwd>Disinformation Detection</kwd>
        <kwd>Crowdsourcing</kwd>
        <kwd>Credibility Sig- nals</kwd>
        <kwd>Explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>One of the reasons that makes misinformation a hard problem is that verifying
a claim requires skills that only a fraction of the population have; typically
welleducated domain experts, fact-checkers or journalists who know where to nd
verifying information for a particular domain. As a consequence fact-checking is a
task that cannot easily be performed by crowdsource workers, who have di erent
levels of education and which may lack speci c domain knowledge. This
bottleneck means in turn that it is di cult to train accurate, domain independent,
automated systems to help in the fact-checking process as there is a relatively
limited amount of fact-checks available. Furthermore, available fact-checks are
highly biased towards claims of speci c domains considered more important at
the time, i.e. political claims during elections or health claims during pandemics.
? Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>
        Several automated systems have been proposed [
        <xref ref-type="bibr" rid="ref11 ref2 ref4 ref5">2,5,4,11</xref>
        ] to help in
misinformation detection tasks. However, their accuracy is still quite poor at the overall
task of detecting misinforming claims, articles or social media posts in the wild.
Ideally, these systems would catch misinformation before it is spread on social
media, which means they should be accurate based on the content of the reviewed
item. Current content-based systems only achieve about 72% accuracy [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] on
datasets like FakeNewsNet, which are relatively easy as they (i) provide plenty
of content (news articles), (ii) are simpli ed into a binary classi cation (fake or
real), and (iii) which have been already reviewed by fact-checkers1.
      </p>
      <p>
        In our previous work on Linked Credibility Reviews (LCRs) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we showed
that our implementation, called acred, obtained state of the art results based on
the following steps:
{ simple content decomposition: basing the credibility of more complex
documents like articles and tweets on its parts like sentences or linked articles
and metadata like its publisher website. In our current implementation of
acred, we have introduced a checkworthiness lter to only take into account
sentences which are factual statements.2.
{ linking those sentences to a database of claims already reviewed. This linking
was achieved using simple, domain-independent linguistic tasks such as
semantic similarity and stance detection for which high accuracy deep learning
models can be trained (92% accuracy on stance detection and 83 pearson
correlation on semantic similarity, using RoBERTa)
{ normalising existing evidence for:
claims from ClaimReviews provided by reputable fact-checkers and
websites from reputation scores by WebOfTrust, NewsGuard, and others.
      </p>
      <p>Surprisingly, initial error analysis showed that most of the errors could be
traced back to the sentence linking steps. One of the advantages of the LCR
approach is that it generates a graph of sub-reviews, rather than just producing
a single credibility label. In this paper we propose a method for exploiting the
traceability of LCRs in order to (i) be able to crowdsource the error analysis
process and (ii) derive new training samples for credibility review subtasks like
semantic similarity and stance detection.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Problem and Intuition</title>
      <p>
        Consider the tweet shown in Figure 1a. Using acred, we can generate a credibility
review for that tweet, which we can show to the users in a couple of ways. The
1 Social signals (replies, likes, etc.) provide further evidence which can improve
accuracy[
        <xref ref-type="bibr" rid="ref10 ref11">10,11</xref>
        ], but can only be used after the content has spread.
2 This is implemented as a RoBERTa model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] netuned on a
combination of datasets: CBD [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Clef'20 Task 1 (see https://github.com/sshaar/
clef2020-factchecking-task1) and claims extracted from ClaimReview metadata. We
obtain f1 weighted scores of 0:85 on Clef'19 Task 1 and 0:95 on 2020 debate (see
https://github.com/idirlab/claimspotter/tree/master/data/two class)
most concise way is shown as a bar on top of the tweet in Fig. 1a; the bar displays
the acred credibility label for the tweet. To the right of the label, we see a couple
of buttons that allow users to provide feedback about whether they agree (happy
face) or disagree (sad face) with the label assigned by the system. In this case,
the numbers indicate there's a clear majority of users who disagree with the
label, which tells us that something has gone wrong in acred's analysis. The
challenge is guring out which step(s) in the acred analysis introduced errors.
Fig. 2a shows the graph of all the evidence gathered and considered by acred
in order to produce the \credible" label shown to the user. Each of the \meter"
icons is a sub-review |e.g. a credibility review of one sentence in the tweet, or
a similarity review between that sentence and some other sentence for which a
credibility value is known| which contributed to the nal rating, therefore any
of those steps could have introduced an error, but which ones? Obviously we
do not want to generate tasks for all 36 sub-reviews. Instead, we want to select
the sub-reviews most likely to have produced the error. The rest of the paper
discusses how to do that and what kind of crowdsourcing task could be used to
nd errors in the graph.
      </p>
      <p>Intuition for our approach LCR bots, responsible for contributing the
subreviews, will tend to apply heuristics to select certain sub-reviews (and discard
others). In Figure 1b we see an interface showing a card for the nal credibility
review for the tweet. In essence, it is summarising the graph shown in Fig. 2a.
The generated explanation clearly only uses some of the evidence in the graph.
In particular, we see that the explanation hinges on just one of the sentences in
the tweet and it agreeing with a similar sentence found on a website deemed to
be credible. This chain of evidence is shown in Fig 2b, which is a subset of 7
(out of the initial 36) sub-reviews from 2a. In this sub-graph, all the sub-reviews
directly contribute to the nal label. Since the nal label is erroneous, one or
more of these evidence nodes must have introduced some error.3
3</p>
    </sec>
    <sec id="sec-3">
      <title>Crowdacred</title>
      <p>3.1</p>
      <sec id="sec-3-1">
        <title>Preliminaries</title>
        <p>
          In this section we formalise the problem and our approach, called Crowdacred.
Schema.org Reviews and Credibility Reviews Linked Credibility Reviews
(LCR) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], is a linked data model for composable and explainable misinformation
detection. A Credibility Review (CR) is an extension of the generic Review data
model de ned in Schema.org. A Review R can be conceptualised as a tuple
(d; r; p) where R:
{ reviews a data item d, via property itemReviewed, this can be any
linkeddata node (e.g. an article, claim or social media post).
3 Note that some of the discarded sub-reviews may also be erroneous, but those errors
did not contribute to the nal label, hence we ignore them.
(a) Tweet with label and feedback
buttons
        </p>
        <p>(b) Credibility Review with explanation
{ assigns a numeric or textual rating r to (some, often implicit, reviewAspect
of) d, via property reviewRating
{ optionally provides provenance information p, e.g. via properties author and
isBasedOn.</p>
        <p>A Credibility Review (CR) is a subtype of Review, de ned as a tuple hd; r; c; pi,
where the CR:
{ r must have reviewAspect credibility and is recommended to be
expressed as a numeric value in range [ 1; 1] and is quali ed with a rating
con dence c (in range [0; 1]).
{ the provenance p is mandatory and must include information about:
credibility signals (CS) used to derive the credibility rating, which can
be either (i) Reviews for data items relevant to d or (ii) ground credibility
signals (GCS) resources (which are not CRs) in databases curated by a
trusted person or organization.
the author of the review. The author can be a person, organizations or
bot. Bots are automated agents that produce CRs.</p>
        <p>For this paper, the main thing to take into account is that the CR for a
particular data item (e.g. a Tweet) is composed of many \sub reviews" which
are available by following the provenance relation p. For any speci c CRi, we
refer to the overall set of nodes Vi (Reviews, authors, data items and GCS) and
links between them (Ei) as the Evidence Graph Gi = (Vi; Ei) for CRi.
Crowdsourcing Review Tasks A Crowdsourcing Review Task (subsequently
simply referred as task ) t is de ned as a tuple hd; a; oi, where d is a data item to
be reviewed by the user; a is the aspect of d that needs to be reviewed; and o is a
set of possible review values. Tasks need to be performed by human users, hence
we require a function frender which renders the task in a way that a user can
inspect. The user performs the task by inspecting the rendering and selecting
one of the available options, which produces a review of the form (d; ra; pu);
where ra is a rating for aspect a and the ratingValue is one of the options in o.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Problem Statement and Overview</title>
        <p>Given an unlabeled data item d and an automatically derived credibility review
for it, CRd = (d; rd; cd; pd) |and therefore its corresponding evidence graph
Gd = (Vd; Ed)|, create simple tasks t1; t2; :::; tn, which can be performed by
un-(or minimally)trained workers and which (i) allows us to decide whether rd
is accurate and (ii) if rd is not accurate, identi es sub-reviews Rid 2 Vd which
directly caused the error. Furthermore, aim to minimise the number of tasks n.</p>
        <p>In this paper, we propose a two-step method to derive such tasks:
1. collect agreement with overall rating rd
2. for ratings with high disagreement:
{ identify candidate reviews in the evidence graph for rd and
{ derive tasks from the identi ed candidate reviews</p>
      </sec>
      <sec id="sec-3-3">
        <title>Capturing Overall Agreement with Credibility Reviews</title>
        <p>In this rst step, we generate tasks for users to help us identify CR instances
which have an inaccurate credibility rating. For this, we exploit the explainability
of credibility ratings. We propose the following task:</p>
        <p>Given a user u and a credibility review CRd for data item d, we de ne
tagreement = hCRd; agreement; oagreementi as a task where the user is shown a
summary of CRd (likely including a rendering of d), and is asked to produce
a rating oagreement = fagree; disagreeg. For this task we consider two speci c
rendering functions:
{ label maps the values rd and cd onto a credibility label. For example, rd &gt;
0:5 and cd &gt; 0:75 could map to \credible".
{ explain generates a more complex textual explanation by following the
provenance information pd (recursively).</p>
        <p>The result of tagreement is an instance of a Review: (CRd; ragreement; pu). An
example of such a task, using both rendering functions, is shown in gure 1.</p>
        <p>Although this task is much easier than performing a full fact-check of an
article or claim, it can still be cognitively demanding and some users may not have
su cient knowledge about the domain to make an informed decision. Therefore,
we expect this to be a challenging task for most crowdsource workers. As part of
the Co-inform project4, instead of relying on crowdsource workers, we are asking
users of our browser plugin to provide such agreement ratings as an extension
of their daily browsing and news consumption habits. As shown in g. 1a, given
su cient users, a concensus can emerge enabling detection of erroneous reviews.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Finding Candidate Erroneous sub Reviews</title>
        <p>
          Given a credibility review CRd which users have rated as erroneous, in this step,
we identify sub Reviews R0; R2; :::; Rn which have directly contributed to the
nal rating and con dence in CRd. Recall that pd provides provenance
information that can be used. In acred, the relevant provenance is implemented by
providing a list of sub-reviews via property isBasedOn. This list contains
references to all the signals taken into account to derive the rating but in many
cases, the majority of these signals are discarded via aggregation functions (e.g.
selecting the subreview with highest con dence or with lowest credibility
rating [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]). Therefore, we propose to de ne two disjoint subproperties of isBasedOn:
isBasedOnDiscarded and isBasedOnKept.
        </p>
        <p>Using these new subproperties we can de ne a subgraph Gkept of Gd, which
d
contains only those nodes which can be linked to the nal CRd via isBasedOnKept
edges. To illustrate this idea, gure 2a shows an example of a full evidence graph,
while gure 2b shows only the kept subgraph for the same credibility review. As
can be seen from the gures, this step greatly reduces the number of candidate
sub reviews, while also ensuring that those reviews directly contributed to the
nal (presumably erroneous) rating.
4 https://coinform.eu/</p>
      </sec>
      <sec id="sec-3-5">
        <title>De ning Crowdsourcing Tasks</title>
        <p>
          Now that we have identi ed a small number of sub-reviews which directly
inuence the nal credibility rating, we can use crowdsourcing to identify which
steps contributed erroneous evidence. Although we could de ne user agreement
tasks for the individual steps, we can get more actionable information by asking
more speci c questions to the users. For this, we need to de ne custom tasks
for each step in acred. Preliminary error analyses in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] showed that most of the
errors were caused by the linking steps, therefore we discuss three speci c types
of Reviews used in acred and how to derive crowdsourcing tasks for them.
SentenceCheckworthinessReview determines whether a Sentence is
checkworthy or not. This is the case when the sentence is both factual (i.e. not an
opinion or question) and veri able (someone can, in principle, nd out whether
the sentence is accurate or not). We derive task tcheckworthy where ocheckworthy =
fcheckworthy; notFactual; notVeri ableg. Table 1 shows an example rendering
(and expected answer), based on the sub-reviews in Figures 2b and 1b.
Help us to detect if a sentence contains a factual claim
Do you think the following sentence contains a factual claim?
{ \The vast amounts of money made and stolen by China from the United States,
year after year, for decades, will and must STOP."
2 Yes, and the claim can be veri ed
2 Yes, but nobody could verify it
2 No
SentenceSimilarityReview assigns a similarity score to a pair of sentences
hsa; sbi.5 There are existing crowdsourcing tasks de ned for this [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], including
instructions and a rating schema, which we can reuse to de ne tsentenceSimilarity =
hd; sentenceSimilarity; osentenceSimilarityi. The schema, osentenceSimilarity consists of
a scale of 6 values ranging from 0 (the two sentences are completely dissimilar)
to 5 (the two sentences are completely equivalent, as they mean the same thing).
See table 2 for an example.
        </p>
        <p>
          SentenceStanceReview assigns a stance label describing the relation between
a pair of sentences. 6 Although there are many existing datasets [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for this
5 This is implemented in acred via a RoBERTa model that has been ne-tuned on
        </p>
        <p>
          STS-B [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which has in part been derived from previous semantic similarity tasks [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
6 This is implemented in acred via another RoBERTa model that has been ne-tuned
on FNC-1 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
Help us to detect how similar are two sentences
Choose one of the options that describes the semantic similarity grade between the
following pair of sentences.
        </p>
        <p>{ \The vast amounts of money made and stolen by China from the United States,
year after year, for decades, will and must STOP."
{ "The US still supplies much more goods from China and the EU than vice versa.`'
The two sentences are:
2 completely equivalent, as they mean the same thing
2 mostly equivalent, but some unimportant details di er
2 roughly equivalent, but some important information di ers/missing
2 not equivalent, but share some details
2 not equivalent, but are on the same topic
2 on di erent topics</p>
        <p>
          Table 2: Example SentenceSimilarityReview Task
problem, they di er in their target labels. We nd FNC-1[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] labels (agree,
disagree, discuss and unrelated ) provide a good balance as other datasets often
are missing a label for the unrelated case. Also, the FNC-1 labels have the
advantage that they describe symmetric relations (although this is arguable
for discuss ), while other datasets use asymmetric relations like query.
Therefore we de ne tasks tsentenceStance = hd; sentenceStance; osentenceStancei where
osentenceStance = fagree; disagree; discuss; unrelatedg. Table 3 shows an example
of such a task.
        </p>
        <p>Help us to better understand the relation between two sentences
Choose one of the options that describes the relation between the
following sentences.</p>
        <p>{ \The vast amounts of money made and stolen by China from the United States,
year after year, for decades, will and must STOP."
{ "The US still supplies much more goods from China and the EU than vice versa.`'
The two sentences:
2 agree with each other 2 disagree with each other
2 discuss the same issue 2 are unrelated</p>
        <p>Table 3: Example SentenceStanceReview Task
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Summary and Future Work</title>
      <p>
        In this paper, we presented Crowdacred, a method for extending Linked
Credibility Reviews to be able to crowdsource (i) the detection of inaccurate credibility
reviews, (ii) the error analysis or erroneous reviews and (iii) generation of
realistic sample data for NLP subtasks needed for accurate misinformation detection.
We are currently implementing the proposed method on top of acred [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and plan
to run initial crowdsourcing experiments to validate the approach. The
validation study will be based on a core set of (a few dozens) users from Co-inform7
and a larger pool of crowdsource workers. If successful, we aim to be able to
produce new datasets of contents in the wild on speci c topics like covid-19.
Acknowledgements Work supported by the European Comission under grant
770302 { Co-Inform { as part of the Horizon 2020 research and innovation
programme.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
          </string-name>
          , W.: *
          <article-title>SEM 2013 shared task: Semantic textual similarity</article-title>
          .
          <source>In: Second Joint Conference on Lexical and Computational Semantics (*SEM)</source>
          . pp.
          <volume>32</volume>
          {
          <fpage>43</fpage>
          . Association for Computational Linguistics, Atlanta, Georgia, USA (Jun
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Babakar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moy</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <source>The State of Automated Factchecking. Tech. rep. (</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Gazpio</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specia</surname>
          </string-name>
          , L.:
          <article-title>SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation</article-title>
          .
          <source>In: Proc. of the 10th International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>1</volume>
          {
          <issue>14</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Denaux</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez-Gomez</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          :
          <article-title>Linked Credibility Reviews for Explainable Misinformation Detection</article-title>
          . In: 19th International Semantic Web Conference (nov
          <year>2020</year>
          ), https://arxiv.org/abs/
          <year>2008</year>
          .12742
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hassan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , G.,
          <string-name>
            <surname>Arslan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caraballo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gawsane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joseph</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kulkarni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nayak</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sable</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tremayne</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Claim buster: The rst-ever end-to-end fact-checking system</article-title>
          .
          <source>In: Proceedings of the VLDB Endowment</source>
          . vol.
          <volume>10</volume>
          , pp.
          <year>1945</year>
          {
          <year>1948</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
          </string-name>
          , V.:
          <article-title>RoBERTa: A Robustly Optimized BERT Pretraining Approach</article-title>
          .
          <source>Tech. rep. (</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arslan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devasier</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obembe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>GradientBased Adversarial Training on Transformer Networks for Detecting Check-Worthy Factual Claims (feb</article-title>
          <year>2020</year>
          ), http://arxiv.org/abs/
          <year>2002</year>
          .07725
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pomerleau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The fake news challenge: Exploring how arti cial intelligence technologies could be leveraged to combat fake news (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Schiller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daxenberger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Stance Detection Benchmark: How Robust Is Your Stance Detection?</article-title>
          (jan
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Shu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahudeswaran</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Liu, H.:
          <article-title>FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media</article-title>
          .
          <source>Tech. rep. (</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Shu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukherjee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Awadallah</surname>
            ,
            <given-names>A.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruston</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Liu, H.:
          <article-title>Leveraging Multi-Source Weak Social Supervision for Early Detection of Fake News (</article-title>
          <year>2020</year>
          ), http://arxiv.org/abs/
          <year>2004</year>
          .01732
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>