<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Sixth Workshop on Natural Language for Artificial Intelligence, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Hybrid Human-In-The-Loop Framework for Fact Checking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David La Barbera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kevin Roitero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Mizzaro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Udine</institution>
          ,
          <addr-line>Via Delle Scienze 206, Udine</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>30</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Online misinformation is posing a serious threat for the modern society. Assessing the veracity of online information is a complex problem which nowadays is addressed by heavily relying on trained fact-checking experts. This solution is not scalable and, also due the importance of the problem the issue gained the attention of the scientific community, which proposed many AI-based automatic solutions. Despite the eforts made, the efectiveness of such approaches is not yet enough to allow them to be used without supervision. In this position paper, we propose a hybrid human-in-the-loop framework for fact-checking: we address the misinformation issue by relying on a combination of automatic AI methods, crowdsourcing ones, and experts. We study the single components of the frameworks as well as their interactions, and we propose an interleaving of the diferent components which we believe will serve as useful starting point for the future research towards efective and scalable fact-checking.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Misinformation</kwd>
        <kwd>Human-in-the-loop</kwd>
        <kwd>Artificial Intelligence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern times have highlighted the centrality of the threat for the modern society of fake
news and misinformation. Traditionally, misinformation detection is a slow and costly process
that is made solely by expert trained fact-checkers, that can not cope with the ever-increasing
amount of information shared online everyday. To address this issue, researchers are developing
automatic techniques to identify misinformation at scale, and significant eforts have been
made to develop fast and scalable state-of-the-art Artificial Intelligence (AI) algorithms [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ].
Another less traditional approach to tackle such issue is to take advantage of the wisdom of
the crowd [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and leverage crowdsourcing workers [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref6 ref7 ref8 ref9">6, 7, 8, 9, 10, 11, 12, 13</xref>
        ]. Both approaches
have pro- and contra: while AI is usually cheaper and scalable, crowd-workers can perform
more reliable and explainable classifications. To take the best from both worlds, researchers
proposed hybrid Human-In-The-Loop (HITL) approaches that integrate AI, crowd, and experts,
even though only few implementations exist [
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref17">14, 15, 16, 17</xref>
        ]. Diferently from previous work
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], in this paper we propose a concrete architecture for fact-checking, and we inspect the
responsibilities of each component as well as their interactions. In particular, we detail a
pragmatical workflow which should be implemented to efectively classify the veracity of a set
of statements at scale.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        There are both numerous examples of AI techniques for misinformation detection [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as well as
of academic interest on their development and evaluation [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Many diferent AI approaches
exist: Ozbay and Alatas [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] tested 23 supervised AI algorithms on public datasets, Zhao et al.
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] integrated linguistic, topic, sentiment, and behavioral features to develop a model for health
misinformation, Stammbach and Neumann [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] used evidence retrieval techniques and
finetune a BERT-based model for the FEVER challenge, Konstantinovskiy et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] developed a
pipeline to identify misinformation using a multi-task learning approach. Related to that, many
approaches addressed the issue of credibility in social media [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        Focusing on misinformation detection using crowdsourcing, La Barbera et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] first found
an efect of judgment scales and evidence of worker assessors’ bias on political statements,
Soprano et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] used the dataset from Roitero et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to leverage a multidimensional scale
to measure diferent aspects of a statement, Draws et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] found that workers generally
overestimate the truthfulness and that diferent type of workers show diferent biases when
evaluating a given statement, Pennycook and Rand [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] used the crowd to study efects of
reducing social media users’ exposure to low-quality news, and Allen et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] compared the
accuracy ratings between fact-checkers and crowd-workers.
      </p>
      <p>
        Finally, some work investigated the combination of AI and humans: Demartini et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
introduced a theoretical hybrid HITL framework for misinformation, Qu et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] used
selfreported scores from both AI and crowd to develop a hybrid system, Shabani et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] used
humans to provide feedback on news stories about statement contextual information and
integrated those features into an AI pipeline, and Yang et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] showed the potential speed up
to the fact-checking process by organizing and selecting representative statements.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Limitations of Current Approaches</title>
      <p>As highlighted by Demartini et al. [17, Figure 2] each of the three state-of-the-art approaches
for misinformation detection i.e., experts, AI tools, and crowd has its own advantages and
disadvantages in terms of accuracy, scale, cost, explainability, and bias control. We detail these
aspects in this section, focusing on the limitations of each approach.</p>
      <p>
        Certainly AI tools outperform both crowd and experts when considering costs1 and evaluation
speed, but despite recent works [
        <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
        ], they provide less or no explainability. More importantly,
such models achieve lower accuracy than crowd or experts. To provide some examples, classical
machine learning models achieved 74% accuracy on a two-level scale [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], and the best model of
this year CLEF CheckThat! Lab reached 54.7% accuracy on a four-level scale [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Considering
the accuracy from the crowd, experimental results [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] show a high correlation with the
experts in terms of agreement, whereas other work reports accuracy values that are lower and
1while training language models from scratch can cost up to millions of dollars [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], once trained they can be used
multiple times leveraging few- or zero-shot learning [
        <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
        ].
comparable to those obtained by AI methods [
        <xref ref-type="bibr" rid="ref10 ref11 ref13 ref7 ref8 ref9">7, 8, 9, 10, 11, 13</xref>
        ]; although further studies are
needed to draw definitive conclusions it seems reasonable to assume that crowd accuracy can
be higher than automatic AI solutions. The highest accuracy is achieved by the experts, which
is always set to the value of 1 for practical reason. Nevertheless, even domain experts need
confrontation and discussion phases to reach a final consensus (see for example the process
used by PolitiFact2).
      </p>
      <p>
        Bias is also a crucial limitation of current approaches. Experts and crowd-workers being
humans are subject to cognitive biases [
        <xref ref-type="bibr" rid="ref13 ref7">13, 7</xref>
        ], which can be mitigated by the discussion phase
in the case of experts, but are dificult to remove for crowd-workers [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. Moreover, all the
aforementioned biases can be propagated from humans to AI models, e.g., when training or
ifne-tuning a model.
      </p>
      <p>
        Another limitation of current approaches is given by the specific truthfulness scales used;
diferent scales exist and are used, and such heterogeneity, apart from making a fair comparison
dificult, has an impact on the quality of the collected data [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>We believe that a HITL framework for misinformation detection should address and overcome
all of the limitations detailed above by fruitfully combining the capabilities of AI, crowd, and
experts.</p>
    </sec>
    <sec id="sec-4">
      <title>4. HITL Framework for Misinformation Detection</title>
      <sec id="sec-4-1">
        <title>4.1. Possible Architectures</title>
        <p>A natural solution to the task investigated in this paper is to employ a pipeline model where
the components are sorted with an increasing accuracy (i.e., first the AI, then the crowd,
and finally the experts). Thus, if a statement is not adequately classified by a component, the
subsequent pipeline component will perform a more accurate classification. Also, such a pipeline
concatenates each component according to their increasing cost and evaluation time. This
allows to perform a pipeline of annotation tasks where the majority of the statements are quickly
and automatically labeled by AI, only a subset of the statements is sent for a slower evaluation
to the crowd, and the few remaining statements are sent to experts for an in-depth investigation.
The key advantage of this configuration is that it takes the best from each component, and
that it allows to minimize the overall costs. Particularly, this configuration lets the experts (i.e.,
the more costly component) to evaluate a very small number of statements. Nevertheless, the
pipeline model has important limitations as it does not provide feedback among the components:
a statement is simply forwarded until it is eventually classified with not much cooperation
among the components.</p>
        <p>
          Another possible combination of the components is by means of a blackboard architecture, a
common solution in distributed multi-agent settings [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Such an approach allows the
components to select which statements to evaluate. Each component is an autonomous agent that
can access a central repository that contains both the statements and the partial contributions
provided by each component. This approach would require both a high synergy between the
2https://www.politifact.com/article/2018/feb/12/principles-truth-o-meter-politifacts-methodology-i/
#Truth-O-Meter%20ratings
Statement
final classification
        </p>
        <p>component classification  confidence
feedback next component
Classified
Statement
components and to split a classification task in atomic sub-tasks to take advantage from each
specific component of the architecture.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. General Framework</title>
        <p>An ideal framework should maximize accuracy while minimizing the cost of each component
and strengthening the cooperation between and within its modules. Therefore, we propose
ifrst a basic framework, where each component provides feedback to, and cooperates with, the
others. We then discuss possible variants and extensions.</p>
        <p>
          Our proposal is summarized in Figure 1. Given a statement, each of the three components
(AI, crowd, and experts) generates: a classification on a chosen scale and a confidence score
for the performed classification. Whenever the component AI or crowd generates a prediction
with a high confidence score, the statement is considered as correctly classified. Otherwise if
the confidence is low, the statement is forwarded to the subsequent component. If this is the
case, the output of the component (such as the confidence score and the classification) can be
optionally forwarded along with the statement. This could allow the subsequent component
to perform an informed assessment, if necessary. Also samples of statements considered as
correctly classified by the component (i.e, with high confidence) should be propagated, to
double check their classification score and deal with the problem of unknown-unknowns (i.e.,
statements for which AI is highly confident about its predictions but is wrong) using humans
[
          <xref ref-type="bibr" rid="ref31 ref32 ref33">31, 32, 33</xref>
          ]. This allows each component to provide feedback to the previous ones, thereby
improving their classifications.
        </p>
        <p>In the following sections we will detail for each component: its possible internal structure,
its specific interactions with other components, and additional outputs that can be added to the
general framework.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. First Component: AI</title>
        <p>
          Assuming the use of a state-of-the-art model for misinformation detection [
          <xref ref-type="bibr" rid="ref18 ref19 ref2 ref28 ref3 ref34 ref4">28, 2, 19, 3, 4, 34, 18</xref>
          ],
the output provided by the AI component should be at least a classification score on a chosen
truthfulness scale, and a confidence score. While the classification score is straightforward, the
confidence can be reliably calibrated following the methodology by Guo et al. [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. To provide an
adequate classification, AI tool can rely on a Knowledge Base (KB) to perform evidence retrieval.
Examples of such a system are the ones proposed by La Barbera et al. [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] and Stammbach and
Neumann [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], who both use a transformer architecture who rely on retrieved evidence. The
choice of the Knowledge Base (KB) to use to produce a classification and an explanation is not
straightforward, since there is no evidence of a “universally best” KB [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]. Thus, the choice
of the specific KB should be performed ad-hoc by leveraging statements and domain specific
features, as for example the topic, speaker, year, etc. of the set of statements being processed.
        </p>
        <p>
          To evaluate the classification score given by the component, we can use optional output.
For example, many AI models are able to provide reasons for their predictions [
          <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
          ]. Some
implementations are delivered by Kazemi et al. [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ] and by Brand et al. [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ] who develop
models able to generate an explanation for their misinformation assessment. The generation
of an explanation could improve the framework by providing additional and human-readable
information useful for both the subsequent human-based components and the final classification.
        </p>
        <p>
          Finally, the AI component could provide self-feedback by using counterfactual explanations
[
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]: generating instances that the model finds hard to classify or deceiving could improve the
model performances, robustness, and generalization abilities.
        </p>
        <p>
          The output of the AI component is thus made by classification, confidence, and optional
information, such as explanation and retrieved evidence. The decision whether the statement
has been adequately classified or not can be then performed by relying on the confidence of
the model [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] as detailed in Section 4.2. To help this decision, it could be used the optional
explanation, for example considering its readability or semantic scores. The decision for some
statements might be more critical and not straightforward: a very recent statement made by an
important public figure over a highly relevant topic with not much evidence available might
be worth further investigation. Hence, it might be worth studying the efectiveness of an
importance score using the statement’s metadata.
        </p>
        <p>Finally, if the assessment for the statement has a low confidence, the explanation is not
satisfactory, or the assessment needs to be refined for any other reason, the statement is sent to
the subsequent component: the crowd.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Second Component: Crowd</title>
        <p>
          As for the AI, the crowd component should perform two tasks: misinformation classification and
provide feedback to itself and to the AI component. There are many examples of misinformation
classification directly performed by the crowd [
          <xref ref-type="bibr" rid="ref11 ref12 ref13 ref6 ref7 ref8">6, 7, 8, 11, 12, 13</xref>
          ]. It could also be reasonable to
perform an informed assessment relying on the output of the AI component [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ]. Nevertheless,
the use of this additional information could introduce biases into the assessment performed by
the crowd, hence further studies in this direction are required. Moreover, to reduce workers
cognitive efort, it is possible to design a two steps task using disjoints sets of workers: the
ifrst set will search for evidence for a given statement, the second will classify the statement
using the provided evidence (and additional data). While all of the diferent mentioned tasks
are indeed reasonable, it is necessary to perform ad-hoc studies to find the best possible setting.
Along this line, we can leverage work done in related fields [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ] to identify the subset of best
workers and exploit their features to be able to minimize the workforce needed and at the same
time maximize its efectiveness.
        </p>
        <p>Also, the crowd can be asked to provide additional rationales to motivate their classification
[43, 44]. The classifications can be used to improve the AI component by fine-tuning the models
with additional data, or even both workers and AI rationales can be used to adjust the confidence
of the final assessment; nevertheless, this should be implemented with caution, as workers
rationales might contain bias that can be involuntary injected into AI models. Finally, a subset
of crowd-workers should look for counterfactual examples that could highlight AI classification
errors with high confidence. While these methodologies still need to be tested in the field of
misinformation detection, some work [45] shows the promising results of this approach applied
to diferent domains.</p>
        <p>
          As for the AI component, the output of the crowd component is composed by the default
classification and confidence, along with optional additional data such as evidence, explanation,
and rationales. Therefore, to decide if a statement is correctly classified or not it is possible
to rely not only on the data generated by the crowd, but also to check for agreement and
inconsistencies between crowd and AI [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
        <p>At this point of the evaluation, the majority of the statements have been classified by the
framework, and only a very small subset will reach the final step of the workflow: the experts.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Third Component: Experts</title>
        <p>The last step of the framework is made by the experts. It is possible to let them evaluate a
statement using a pre-defined fact-checking methodology, and ideally to provide to them all the
outputs from the previous components to perform an informed assessment. The efects of such
a decision need to be studied since, as discussed for the crowd, the use of additional information
could introduce bias in the final evaluation. We remark that we believe that critical, important,
and dificult statements should always be evaluated or at least checked by the experts. Note that
to identify those statements it would be necessary to find a metric to be able to automatically
evaluate the importance of a statement in a given context. Also, to increase the robustness of
the framework, the experts should be able to directly look at the statements classified by the
previous components and to decide whether some of them need to be re-assessed or not. Finally,
each classification performed by the experts should be used to re-train the AI models, and used
as an example to train the crowd before performing the task. This final aspect could also be
performed interactively, following an active-learning scenario.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this work we study the limitations of the current approaches for misinformation detection and
propose a hybrid HITL framework that combines AI, crowd, and experts. Our main contributions
are the following: we frame the problem and review the related work detailing frameworks for
fact-checking; we study possible framework architectures detailing their respective advantages
and disadvantages; we propose a solid architecture for performing fact-checking at scale, and we
describe each component focusing on its role and outputs, as well as its interactions with other
components. The main advantages of our framework are given by an eficient combination of
the components in terms of increasing accuracy and evaluation time, decreasing costs, and by
the feedback between and within each component.</p>
      <p>Future work aims at proving a full framework implementation. More in detail, further study
will be done on the synergies between crowd and AI to investigate the efects of an informed
assessment made by the crowd leveraging AI outputs, and to set thresholds to decide about
statement forwarding among components.
[43] T. McDonnell, M. Lease, M. Kutlu, T. Elsayed, Why Is That Relevant? Collecting Annotator
Rationales for Relevance Judgments, in: Proceedings of the 4th Conference on Human
Computation and Crowdsourcing, volume 4 of HCOMP, 2016, pp. 139–148.
[44] M. Kutlu, T. McDonnell, M. Lease, T. Elsayed, Annotator Rationales for Labeling Tasks
in Crowdsourcing, Journal of Artificial Intelligence Research 69 (2020) 143–189. doi: 10.
1613/jair.1.12012.
[45] J. X. Morris, E. Lifland, J. Y. Yoo, J. Grigsby, D. Jin, Y. Qi, TextAttack: A Framework
for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP, 2020.
doi:10.48550/ARXIV.2005.05909.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Passaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <article-title>Preface to the sixth workshop on natural language for artificial intelligence (nl4ai)</article-title>
          , in: D.
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>L. C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          , M. Polignano (Eds.),
          <source>Proceedings of the Sixth Workshop on Natural Language for Artificial Intelligence (NL4AI</source>
          <year>2022</year>
          )
          <article-title>co-located with 21th International Conference of the Italian Association for Artificial Intelligence (AI*IA</article-title>
          <year>2022</year>
          ), November 30,
          <year>2022</year>
          , CEUR-WS.org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <source>The Future of Misinformation Detection: New Perspectives and Trends</source>
          ,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>1909</year>
          .
          <volume>03654</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Ozbay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Alatas</surname>
          </string-name>
          ,
          <article-title>Fake news detection within online social media using supervised artificial intelligence algorithms</article-title>
          ,
          <source>Physica A: Statistical Mechanics and its Applications</source>
          <volume>540</volume>
          (
          <year>2020</year>
          )
          <article-title>123174</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.physa.
          <year>2019</year>
          .
          <volume>123174</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Da</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>58</volume>
          (
          <year>2021</year>
          )
          <article-title>102390</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.ipm.
          <year>2020</year>
          .
          <volume>102390</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Surowiecki</surname>
          </string-name>
          , The Wisdom of Crowds, Anchor,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pennycook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Rand</surname>
          </string-name>
          ,
          <article-title>Fighting misinformation on social media using crowdsourced judgments of news source quality</article-title>
          ,
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>116</volume>
          (
          <year>2019</year>
          )
          <fpage>2521</fpage>
          -
          <lpage>2526</lpage>
          . doi:
          <volume>10</volume>
          .1073/pnas.1806781116.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>La Barbera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roitero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          , G. Demartini, Crowdsourcing Truthfulness:
          <article-title>The Impact of Judgment Scale and Assessor Bias</article-title>
          ,
          <source>in: Proceedings of the 42nd European Conference on Information Retrieval</source>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Roitero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soprano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Demartini, Can The Crowd Identify Misinformation Objectively? The Efects of Judgment Scale and Assessor's Background</article-title>
          ,
          <source>in: Proceedings of the 43rd Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          , ACM,
          <year>2020</year>
          , pp.
          <fpage>439</fpage>
          -
          <lpage>448</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Roitero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soprano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Portelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Della Mea</surname>
          </string-name>
          , G. Serra,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Demartini, The COVID-19 Infodemic: Can the Crowd Judge Recent Misinformation Objectively?</article-title>
          ,
          <source>in: Proceedings of the 29th Conference on Information &amp; Knowledge Management</source>
          ,
          <string-name>
            <surname>CIKM</surname>
          </string-name>
          , ACM,
          <year>2020</year>
          , p.
          <fpage>1305</fpage>
          -
          <lpage>1314</lpage>
          . doi:
          <volume>10</volume>
          .1145/3340531.3412048.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Roitero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soprano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Portelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Luise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Della Mea</surname>
          </string-name>
          , G. Serra,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Demartini, Can the Crowd Judge Truthfulness? A Longitudinal Study on Recent Misinformation about COVID-19, Personal</article-title>
          and
          <string-name>
            <given-names>Ubiquitous</given-names>
            <surname>Computing</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <source>doi:10. 1007/s00779-021-01604-6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Soprano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roitero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. La</given-names>
            <surname>Barbera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ceolin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Demartini,</surname>
          </string-name>
          <article-title>The many dimensions of truthfulness: Crowdsourcing misinformation assessments on a multidimensional scale</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>58</volume>
          (
          <year>2021</year>
          )
          <article-title>102710</article-title>
          . doi:
          <volume>10</volume>
          . 1016/j.ipm.
          <year>2021</year>
          .
          <volume>102710</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Arechar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pennycook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Rand</surname>
          </string-name>
          ,
          <article-title>Scaling up fact-checking using the wisdom of crowds</article-title>
          ,
          <source>Science Advances</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <article-title>eabf4393</article-title>
          . doi:
          <volume>10</volume>
          .1126/sciadv.abf4393.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Draws</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. La</given-names>
            <surname>Barbera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soprano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roitero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ceolin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Checco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          ,
          <article-title>The Efects of Crowd Worker Biases in Fact-Checking Tasks</article-title>
          , in: Conference on Fairness, Accountability, and Transparency, FAccT, ACM,
          <year>2022</year>
          , p.
          <fpage>2114</fpage>
          -
          <lpage>2124</lpage>
          . doi:
          <volume>10</volume>
          .1145/3531146. 3534629.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shabani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Charlesworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sokhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schuldt</surname>
          </string-name>
          , SAMS:
          <article-title>Human-in-the-loop Approach to Combat the Sharing of Digital Misinformation</article-title>
          ,
          <source>CEUR Workshop Proc</source>
          .
          <volume>2846</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vega-Oliveros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Seibt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <article-title>Scalable Fact-checking with Human-in-theLoop</article-title>
          ,
          <source>in: IEEE Workshop on Information Forensics and Security</source>
          ,
          <string-name>
            <surname>WIFS</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/WIFS53200.
          <year>2021</year>
          .
          <volume>9648388</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Karagiannis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Trummer</surname>
          </string-name>
          ,
          <article-title>Scrutinizer: A Mixed-Initiative Approach to Large-Scale, Data-Driven Claim Verification</article-title>
          , CoRR abs/
          <year>2003</year>
          .06708 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Demartini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <article-title>Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities</article-title>
          ,
          <source>Bulletin of IEEE Computer Society</source>
          <volume>43</volume>
          (
          <year>2020</year>
          )
          <fpage>65</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , G. da San Martino,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Köhler</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection</article-title>
          , in: A.
          <string-name>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , G. Da San Martino, M. Degli
          <string-name>
            <surname>Esposti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Pasi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>495</fpage>
          -
          <lpage>520</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Stammbach</surname>
          </string-name>
          , G. Neumann,
          <string-name>
            <surname>Team</surname>
            <given-names>DOMLIN</given-names>
          </string-name>
          :
          <article-title>Exploiting evidence enhancement for the FEVER shared task</article-title>
          ,
          <source>in: Proceedings of the 2nd Workshop on Fact Extraction and VERification</source>
          , FEVER, ACL,
          <year>2019</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>109</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -6616.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>L.</given-names>
            <surname>Konstantinovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Price</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Babakar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <source>Toward Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection, Digital Threats</source>
          <volume>2</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1145/3412869.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Viviani</surname>
          </string-name>
          , G. Pasi,
          <article-title>Credibility in social media: opinions, news, and health information-a survey</article-title>
          ,
          <source>WIREs Data Mining and Knowledge Discovery</source>
          <volume>7</volume>
          (
          <year>2017</year>
          )
          <article-title>e1209</article-title>
          . doi:https://doi. org/10.1002/widm.1209.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Barbera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roitero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          , G. Demartini,
          <source>Combining Human and Machine Confidence in Truthfulness Assessment, Data and Information Quality</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1145/3546916.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>O.</given-names>
            <surname>Sharir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Peleg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shoham</surname>
          </string-name>
          ,
          <article-title>The Cost of Training NLP Models: A Concise Overview</article-title>
          , arXiv (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kojima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Iwasawa</surname>
          </string-name>
          ,
          <article-title>Large Language Models are Zero-Shot Reasoners</article-title>
          ,
          <source>in: Workshop on Knowledge Retrieval and Language Models</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Simonsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lioma</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Augenstein</surname>
          </string-name>
          , Generating Fact Checking Explanations, CoRR abs/
          <year>2004</year>
          .05773 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kotonya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <string-name>
            <surname>Explainable Automated</surname>
          </string-name>
          Fact-Checking: A Survey, CoRR abs/
          <year>2011</year>
          .03870 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Granik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mesyura</surname>
          </string-name>
          ,
          <article-title>Fake news detection using naive Bayes classifier</article-title>
          ,
          <source>in: IEEE First Ukraine Conference on Electrical and Computer Engineering</source>
          , UKRCON,
          <year>2017</year>
          , pp.
          <fpage>900</fpage>
          -
          <lpage>903</lpage>
          . doi:
          <volume>10</volume>
          .1109/UKRCON.
          <year>2017</year>
          .
          <volume>8100379</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>T.</given-names>
            <surname>Draws</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rieger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Inel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Gadiraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>A Checklist to Combat Cognitive Biases in Crowdsourcing</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Human Computation and Crowdsourcing</source>
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <fpage>48</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-J.</given-names>
            <surname>Jeng</surname>
          </string-name>
          ,
          <article-title>Event-based blackboard architecture for multi-agent systems</article-title>
          ,
          <source>in: Proceedings of the Conference on Information Technology: Coding and Computing</source>
          , volume
          <volume>2</volume>
          <source>of ITCC</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>379</fpage>
          -
          <lpage>384</lpage>
          . doi:
          <volume>10</volume>
          .1109/ITCC.
          <year>2005</year>
          .
          <volume>149</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>J.</given-names>
            <surname>Attenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ipeirotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Provost</surname>
          </string-name>
          , Beat the Machine:
          <article-title>Challenging Humans to Find a Predictive Model's “Unknown Unknowns”</article-title>
          ,
          <source>J. Data and Information Quality</source>
          <volume>6</volume>
          (
          <year>2015</year>
          ). doi:
          <volume>10</volume>
          .1145/2700832.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lakkaraju</surname>
          </string-name>
          , E. Kamar,
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          , E. Horvitz, Identifying Unknown Unknowns in the Open World:
          <article-title>Representations and Policies for Guided Exploration</article-title>
          ,
          <source>in: Proceedings of the 31st AAAI Conference on Artificial Intelligence</source>
          ,
          <string-name>
            <surname>AAAI</surname>
          </string-name>
          , AAAI Press,
          <year>2017</year>
          , p.
          <fpage>2124</fpage>
          -
          <lpage>2132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Guerra</surname>
          </string-name>
          , I. Fung,
          <string-name>
            <given-names>G.</given-names>
            <surname>Matute</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kamar</surname>
          </string-name>
          , W. Lasecki,
          <article-title>Towards Hybrid Human-AI Workflows for Unknown Unknown Detection</article-title>
          ,
          <source>in: Proceedings of The Web Conference, WWW, ACM</source>
          ,
          <year>2020</year>
          , p.
          <fpage>2432</fpage>
          -
          <lpage>2442</lpage>
          . doi:
          <volume>10</volume>
          .1145/3366423.3380306.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>B.</given-names>
            <surname>Taboubi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. B.</given-names>
            <surname>Nessir</surname>
          </string-name>
          , H. Haddad, iCompass at CheckThat!
          <year>2022</year>
          <article-title>: combining deep language models for fake news detection</article-title>
          , Working Notes of CLEF (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>C.</given-names>
            <surname>Guo</surname>
          </string-name>
          , G. Pleiss,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <source>On Calibration of Modern Neural Networks, in: Proceedings of the 34th Conference on Machine Learning</source>
          , volume
          <volume>70</volume>
          of ICML, JMLR.org,
          <year>2017</year>
          , p.
          <fpage>1321</fpage>
          -
          <lpage>1330</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>D.</given-names>
            <surname>La Barbera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roitero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mackenzie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          , G. Demartini, S. Mizzaro, BUM at CheckThat!
          <year>2022</year>
          :
          <article-title>A Composite Deep Learning Approach to Fake News Detection using Evidence Retrieval</article-title>
          , in: Working Notes of CLEF 2022-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>564</fpage>
          -
          <lpage>572</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>D.</given-names>
            <surname>Stammbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , E. Ash,
          <source>The Choice of Knowledge Base in Automated Claim Checking, CoRR abs/2111</source>
          .07795 (
          <year>2021</year>
          ). arXiv:
          <volume>2111</volume>
          .
          <fpage>07795</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kazemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pérez-Rosas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <article-title>Extractive and Abstractive Explanations for Fact-Checking and Evaluation of News</article-title>
          ,
          <source>CoRR abs/2104</source>
          .12918 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>E.</given-names>
            <surname>Brand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roitero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soprano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Demartini</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Neural</surname>
          </string-name>
          <article-title>Model to Jointly Predict and Explain Truthfulness of Statements, Data</article-title>
          and Information
          <string-name>
            <surname>Quality</surname>
          </string-name>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1145/3546917.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>D. M. Ziegler</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Stiennon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
            ,
            <given-names>P. F.</given-names>
          </string-name>
          <string-name>
            <surname>Christiano</surname>
          </string-name>
          , G. Irving,
          <article-title>Fine-Tuning Language Models from Human Preferences</article-title>
          , CoRR abs/
          <year>1909</year>
          .08593 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>C.</given-names>
            <surname>Snijders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Conijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fouw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berlo</surname>
          </string-name>
          , Humans and Algorithms Detecting Fake News:
          <article-title>Efects of Individual and Contextual Confidence on Trust in Algorithmic Advice</article-title>
          , Journal of Human-Computer
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . doi:
          <volume>10</volume>
          .1080/10447318.
          <year>2022</year>
          .
          <volume>2097601</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Cheaper and Better: Selecting Good Workers for Crowdsourcing</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Human Computation and Crowdsourcing</source>
          <volume>3</volume>
          (
          <year>2015</year>
          )
          <fpage>20</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>