<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving the Reliability of Health Information Credibility Assessments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcos Fernández-Pichel</string-name>
          <email>marcosfernandez.pichel@usc.es</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Selina Meyer</string-name>
          <email>selina.meyer@ur.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Bink</string-name>
          <email>markus.bink@ur.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Frummet</string-name>
          <email>alexander.frummet@ur.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David E. Losada</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Elsweiler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Compostela</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Spain</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair for Information Science, Regensburg University</institution>
          ,
          <addr-line>Regensburg, Bavaria</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>The applicability of retrieval algorithms to real data relies heavily on the quality of the training data. Currently, the creation process of training and test collections for retrieval systems is often based on annotations produced by human assessors following a set of guidelines. Some concepts, however, are prone to subjectivity, which could restrict the utility of any algorithm developed with the resulting data in real world applications. One such concept is credibility, which is an important factor in user's judgements on whether retrieved information helps to answer an information need. In this paper, we evaluate an existing set of assessment guidelines with respect to their ability to generate reliable credibility judgements across multiple raters. We identify reasons for disagreement and adapt the guidelines to create an actionable and traceable annotation scheme that i) leads to higher inter-annotator reliability, and ii) can inform about why a rater made a specific credibility judgement. We provide promising evidence about the robustness of the new guidelines and conclude that they could be a valuable resource for building future test collections for misinformation detection.</p>
      </abstract>
      <kwd-group>
        <kwd>reliability</kwd>
        <kwd>credibility assessments</kwd>
        <kwd>health-related content</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Misinformation on the Internet is becoming increasingly common [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. When interacted with,
such information can cause people to make decisions with potentially harmful consequences
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], especially when the information is related to health [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Multiple prestigious venues
regularly organise shared-task competitions with the goal of developing retrieval or classification
algorithms that counteract misinformation on the web [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ]. While these initiatives are
highly valuable to fostering research in misinformation detection, the potential impact of
the algorithms developed by participants may be restricted by the quality of the provided
(D. Elsweiler)
training and test data. Generating ground truth data is a crucial and costly process, as it often
requires the intervention of human assessors for creating complex assessments. For instance,
in the TREC Health Misinformation Track judge(s)1 are asked to label documents on topical
relevance, credibility, and correctness2. Some of these dimensions, however, are hard to assess
and annotation practices vary across competitions. One such hard to assess concept is credibility,
which has been shown to be highly subjective and susceptible to individual diferences [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Here
we adopt the term “credibility”, as defined by the TREC track: the document’s trustworthiness
and authoritativeness, as perceived by the assessors [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This is intrinsically a subjective concept,
but robust guidelines can help in clarifying the label creation process and, thus, produce more
solid benchmarks.
      </p>
      <p>In this study, we try to shed light on the dificulty to create credibility assessments and propose
new guidelines to produce more robust judgements. We do this by applying the current TREC
Health Misinformation credibility guidelines to a series of health-related web documents and
evaluating agreement across multiple raters. We then identify reasons for rater disagreement
and adapt the existing guidelines to create an actionable and traceable annotation scheme
that i) leads to higher inter-annotator reliability, and ii) can inform about why a rater made a
specific credibility judgement. We provide promising evidence about the robustness of the new
guidelines and conclude that they could be a valuable resource for building test collections for
misinformation detection and thus mitigating the impact of health misinformation on the web.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Research on credibility examines how characteristics, such as expertise or trustworthiness,
afect the “believability” of an information source [
        <xref ref-type="bibr" rid="ref9">9, 10</xref>
        ]. Credibility judgements for web pages
have been studied extensively and are known to be influenced by aesthetics and impressions
of professionalism [11, 12]. Credibility is a largely subjective concept and can be strongly
influenced by an individual rater’s personal disposition, e.g. how suspicious they are by nature,
their propensity to risk [12], as well as their reading abilities [13]. There is mixed evidence with
respect to the influence of topical knowledge and expertise [ 14, 15].
      </p>
      <p>The subjectivity inherent to credibility judgements demands clear and specific
guidelines for the development of test collections, so that robust annotations can be
obtained that accurately reflect population judgements overall. One attempt at creating
such guidelines was presented by Nabożny et al. [16] in the context of medical misinformation.
In contrast to our aims, these authors directed their annotation protocol towards medical experts
and focused on sentence credibility, marked mainly by the presentation of factual and reliable
information, rather than web page credibility. Zhang and colleagues [17] proposed a set of
article credibility indicators, which included hard to spot indicators such as logical fallacies and
tone. Their approach, which requires trained annotators, assigned content and context based
annotation to individuals with diferent levels of training. The results showed low inter-rater
agreement on some of the items, again highlighting the dificulty of credibility annotation.
1it is unclear from the TREC overview if multiple assessors were employed
2https://trec-health-misinfo.github.io/docs/TREC-2021-Health-Misinformation-Track-Assessing-Guidelines_
Version-2.pdf</p>
      <p>We examine here a set of guidelines for judging web page credibility which, to our knowledge,
have not yet been publicly evaluated, namely the TREC Health Misinformation track guidelines2.
We then adapt this framework to create new guidelines which show first evidence of leading to
higher inter-rater reliability with minimal to no training in preliminary tests, thus enabling
non-domain experts to judge the credibility of health-related web pages.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluating Guidelines</title>
      <p>According to the TREC Health Misinformation track’s guidelines, a highly credible document
in the context of health should be “unquestionably trustworthy and authoritative”, whereas for
low-credibility documents “there is little evidence to believe or trust the information source”,
with medium-credibility documents located in-between. To help determine the credibility level
of a document, human assessors are provided with an extensive list of guidelines to follow2.
Information about the number of assessors recruited to judge the credibility of documents
in this TREC collection is not publicly available. There is also no public information about
inter-rater agreement in the track. To evaluate the reliability of these judgements, four of the
authors of this paper followed these guidelines to independently judge 12 randomly chosen web
documents from an existing collection of documents from the medical domain [18, 19]. Next,
we calculated pairwise, linear-weighted Cohen’s Kappa to evaluate agreement between single
raters, and Krippendorf’s Alpha for ordinal scales to evaluate agreement between all raters.
Kappa-values ranged between 0.25 and 0.79, with a median of  = 0.44 . Krippendorf’s  was
0.6. These Kappa values indicate only moderate agreement on average [20], and  falls below
the lowest conceivable limit of  ≥ 0.667 for reliable annotations (as defined by Krippendorf
[21]).</p>
      <p>Next, the four annotators discussed their judgements in a group meeting to identify the
problems with the current guidelines. Three main reasons were found behind the low agreement
between raters. First, the lengthy and unstructured nature of the guidelines. In some cases,
it was dificult to pinpoint which part of the guidelines had led the annotators to decide on a
certain judgement. The guidelines are unnumbered, despite consisting of 13 bullet points, and
some items could have been broken down into multiple related aspects (for example, “Try to
determine the amount of expertise, authoritativeness, and trustworthiness of the document”).
This makes the process less traceable and causes raters to focus on diferent aspects, leading to
divergent credibility judgements. Second, the lack of a clear-cut diferentiation between levels
of credibility. While three levels of credibility are defined, the guidelines give no indication
on how these definitions relate or where cut-ofs lie. It is left for the annotator to decide,
how important each guideline item is for the credibility of the document. Third, the use of
ambiguous concepts as a way to judge credibility. The guidelines introduce new, dificult
to define and comparably subjective concepts to act as credibility indicators. These include
e.g. trustworthiness, authoritativeness, expertise, and ubiquity. These issues informed the
development of the new guidelines proposed below.</p>
      <p>G1
G2
G3
G4
G5
G6</p>
      <p>Label
2
1
1
0
0
0</p>
      <p>Guideline
Source is a scientific paper, or a Medical publisher or hospital/clinic or government
website or university.</p>
      <p>Document is citing the information they provide in their articles. They provide links
or specific references to their sources. They cite sources with credibility 2 (i.e. medical
publications and/or lab studies).</p>
      <p>Document is written by an expert in the field/someone qualified to write this document
(irrespective of publishing venue).</p>
      <p>The document is actually for advertising or marketing purposes. If so, the website might
be biased or a scam designed to trick people into fake treatments or into buying medical
products that do not live up to their claim.</p>
      <p>The information posted by a non-expert person providing a medical product review or
providing medical advice without proper citations (links/list of references).</p>
      <p>The website provides or states claims that go against well-known medical consensus
(e.g. smoking cigarettes does not cause cancer).</p>
      <p>Step
1
4
3
2
5
5
NOTE: It is generally allowed to look up authors to check whether they have the required knowledge to
be regarded as an expert and look up websites to find out if they are legitimate.</p>
    </sec>
    <sec id="sec-4">
      <title>4. A Robust and Traceable Set of Credibility Guidelines</title>
      <p>The TREC guidelines were taken as a starting point and the individuals involved in the initial
annotation iteratively revised the guidelines in multiple discussion sessions. At the end of
this process, the original guidelines had been condensed and adapted to six guidelines for
webpage credibility labelling (see Table 1). The new guidelines provide clear credibility score
recommendations based on the fulfilment of a number of criteria, summarise the TREC guidelines
in a way that reflects the most important aspects and, at the same time, decrease ambiguity.
The full guidelines are presented in this paper both in written and in flowchart form to facilitate
and speed up the evaluation with a visual tool, see Figure 1 and Table 1. Each guideline can
be mapped to a specific step in the flowchart. The new guidelines forgo the introduction of
complicated concepts as much as possible and rely mostly on measurable indicators.</p>
      <p>The same 12 documents from the initial evaluation were then annotated again by the same
four raters using the new guidelines. This led to an increase in agreement by 28%, resulting in a
Krippendorf’s  of 0.88 and a median Cohen’s  of 0.89 (max:  = 1 , min:  = 0.78 ), indicating
almost perfect agreement.</p>
      <sec id="sec-4-1">
        <title>4.1. Evaluation with a New Sample of Webpages</title>
        <p>Since the assessors were already familiar with the initial 12 documents and had used them to
inform the guideline development, we needed to extend the evaluation of the new guidelines to
other documents. To that end, a new, previously unseen sample of 12 webpages was randomly
drawn from the same document collection and again annotated by the four raters. An even
higher agreement of  = 0.93 and median  = 0.88 (min:  = 0.78 , max:  = 1 ) was achieved on
these new documents. Of this selection of documents, 40% were labelled with credibility 1, 37%
with credibility 2 (the highest credibility), and 20% as non-credible at all. This evaluation with
the new sample demonstrates the efectiveness of our guidelines and, more importantly, the
general need for more specific instructions that produce more robust labels for concepts prone
to subjective diferences.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation with External Assessors</title>
        <p>Four external assessors (1 - 4 ), who were not involved in the design process, were recruited
to further evaluate the guidelines. These assessors were recruited from our research group
but were not involved in this project. They are familiar with search technologies, but they are
non-experts in the medical domain. 1 was trained in a 15-minute conversation, in which open
questions were answered. The three remaining assessors were not trained (in order to test
how well the annotation process works with no prior knowledge). Krippendorf’s  was then
recalculated, taking both the authors’ and external assessors’ judgements into account. While
the score decreased compared to the agreement between only the original assessors, at  = 0.72
it is still substantial and 12% higher than on the original guidelines. However, the pairwise
Cohen’s Kappa scores revealed significant diferences in agreement between diferent raters
(with scores ranging between  = 0.18 and  = 1 ). As expected, 1 obtained a higher agreement
with the authors. This suggests that credibility judgements, while subjective at first glance, can
become more objective with a short training. Assessor 3 produced comparably low-agreement
judgements, while the agreement for the remaining assessors yielded a Krippendorf’s  of 0.82
and  = 0.80 (considering all external assessors but 3 ). In practice, we could even consider to
remove 3 ’s judgments, as excluding low-agreement workers is a common procedure in the
literature [22]. We include a discussion of these individual diferences and an error analysis in
Section 5 along with some general conclusions.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Guideline Traceability</title>
        <p>One of the goals of our new guidelines was to make raters’ decisions traceable and explainable.
We thus asked each annotator to note down not only a credibility label, but also the guideline
they based their decision on. This allows the systematic evaluation of the quality of credibility
judgements and can reveal potential misunderstandings and reasons for disagreements, enabling
the incremental improvement of the guidelines. Agreement on the guidelines was calculated
using Krippendorf’s  for nominal scales and was high at  = 0.77 . Including the external
assessors led to a decreased agreement score of  = 0.51 . However, not considering 3 ’s
judgements would lead to an increase in agreement ( = 0.59 ). While agreement on the
guidelines is lower than on the credibility labels, they mainly serve as an explainability tool
and were used for the error analysis included in Section 5. Observing the concrete decisions
over the documents, 38% were labelled using G1 (source has a scientific basis), 17% under G2
(proper citations), 22% under G3 (written by an expert), 9% under G4 (advertising purposes),
11% under G5 (written by a non-expert without citations), and surprisingly none felt under G6
category (claims against well-known medical consensus).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>From analysing the assessors’ input, we found some diferences in interpretation of the guidelines
between 3 and the other assessors. Sources of disagreement were, whether dentistry websites
should be judged as clinic sources and websites such as “MedicineNet” as medical publishers
(G1). Thus, a possible improvement could be to remove this terminology from G1 as the term
“government website” already encompasses health publishers like the CDC or the WHO. On
the other hand, some dentistry websites displaying their number and the invitation to book
an appointment next to blog articles were labelled as advertisements (G4) by some annotators
whereas others interpreted them as written by medical experts (G3). Therefore, amending G4’s
wording to “the website/article is trying to sell a product, and we may conclude from its content
that it is a fake” would be an improvement.</p>
      <p>The main finding of this study is that well-defined guidelines lead to higher quality labels
and more robust agreement among the judges. While there is still room for improvement in the
proposed guidelines, we have observed that in our experiment even a brief teaching process can
lead to more coherent label. Nevertheless, due to the limited number of reviewers and annotated
documents (which are sort of unavoidable, in our framework) we have promising signals. This
is a significant shift for TREC-like initiatives, raising the quality of the labels generated and
increasing the realism of evaluation. In addition, we consider the process’ increased cost and
dificulty are both manageable.</p>
      <p>One of the limitations of this study is that we cannot yet ascertain that the proposed guidelines
suficiently transfer to the actual credibility of a document. While they certainly reflect quality
and may cover specific aspects of credibility, past research has shown that users seldom judge
credibility based on source or quality [11, 23]. We plan to address this by comparing real users’
subjective credibility judgements with annotations based on our guidelines. Additionally, we
want to see how applying these guidelines to judge the credibility of health related websites
afects experimental results from prior years and to what extent introducing these guidelines to
users can improve their ability to judge the credibility and quality of websites.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In this paper, we have demonstrated the dificulty of assessing webpages in terms of credibility.
Our main contribution is a set of guidelines to create robust annotations that can be further
improved by providing brief training to the raters. In future work, we intend to keep polishing
these guidelines and run a user study to understand the relationship between credibility and the
assigned labels in more detail. We are also interested in providing laypeople with the guidelines
to see whether this improves their ability to judge the credibility/quality of web contents.
We hope that the proposed tool can not only improve annotation processes for producing
high-quality training data, but also have a positive impact on users’ perceptions.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>The authors thank the support obtained from: i) project PLEC2021-007662
(MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia
Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next
GenerationEU), and ii) the Xunta de Galicia - Consellería de Cultura, Educación, Formación
Profesional e Universidades (Centro de investigación de Galicia accreditation 2019-2022
ED431G-2019/04 and Reference Competitive Group accreditation 2022-2025, ED431C 2022/19)
and the European Union (European Regional Development Fund - ERDF).
[10] A. L. Ginsca, A. Popescu, M. Lupu, et al., Credibility in information retrieval, Foundations
and Trends in Information Retrieval 9 (2015) 355–475.
[11] B. J. Fogg, Prominence-interpretation theory: Explaining how people assess credibility
online, in: CHI’03 extended abstracts on human factors in computing systems, 2003, pp.
722–723.
[12] D. H. McKnight, C. J. Kacmar, Factors and efects of information credibility, in: Proceedings
of the ninth international conference on Electronic commerce, 2007, pp. 423–432.
[13] C. Hahnel, F. Goldhammer, U. Kröhne, J. Naumann, The role of reading skills in the
evaluation of online information gathered from search engine environments, Computers
in Human Behavior 78 (2018) 223–234.
[14] M. S. Eastin, Credibility assessments of online health information: The efects of source
expertise and knowledge of content, Journal of Computer-Mediated Communication 6
(2001) JCMC643.
[15] J. Unkel, A. Haas, The efects of credibility cues on the selection of search engine results,</p>
      <p>Journal of the Association for Information Science and Technology 68 (2017) 1850–1862.
[16] A. Nabożny, B. Balcerzak, A. Wierzbicki, M. Morzy, M. Chlabicz, et al., Active annotation
in evaluating the credibility of web-based medical information: Guidelines for creating
training data sets for machine learning, JMIR medical informatics 9 (2021) e26065.
[17] A. X. Zhang, A. Ranganathan, S. E. Metz, S. Appling, C. M. Sehat, N. Gilmore, N. B. Adams,
E. Vincent, J. Lee, M. Robbins, et al., A structured response to misinformation: Defining
and annotating credibility indicators in news articles, in: Companion Proceedings of the
The Web Conference 2018, 2018, pp. 603–612.
[18] M. Bink, S. Zimmerman, D. Elsweiler, Featured snippets and their influence on users’
credibility judgements, in: ACM SIGIR Conference on Human Information Interaction
and Retrieval, 2022, pp. 113–122.
[19] S. Zimmerman, A. Thorpe, C. Fox, U. Kruschwitz, Privacy nudging in search:
Investigating potential impacts, in: Proceedings of the 2019 Conference on Human Information
Interaction and Retrieval, 2019, pp. 283–287.
[20] M. L. McHugh, Interrater reliability: the kappa statistic, Biochemia medica 22 (2012)
276–282.
[21] K. Krippendorf, Content analysis: An introduction to its methodology, Sage publications,
2018.
[22] D. Feng, S. Besana, R. Zajac, Acquiring high quality non-expert knowledge from
ondemand workforce, in: Proceedings of the 2009 Workshop on The People’s Web Meets
NLP: Collaboratively Constructed Semantic Resources (People’s Web), 2009, pp. 51–56.
[23] J. Caverlee, L. Liu, Countering web spam with credibility-based link analysis, in:
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing,
2007, pp. 157–166.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Eysenbach</surname>
          </string-name>
          ,
          <article-title>Infodemiology: The epidemiology of (mis) information</article-title>
          ,
          <source>The American Journal of Medicine</source>
          <volume>113</volume>
          (
          <year>2002</year>
          )
          <fpage>763</fpage>
          -
          <lpage>765</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Rieh</surname>
          </string-name>
          ,
          <article-title>Judgment of information quality and cognitive authority in the web</article-title>
          ,
          <source>Journal of the American society for Information Science and Technology</source>
          <volume>53</volume>
          (
          <year>2002</year>
          )
          <fpage>145</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Pogacar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghenai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Smucker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <article-title>The positive and negative influence of search results on people's decisions about the eficacy of medical treatments</article-title>
          ,
          <source>in: Proceedings of the ACM SIGIR Int. Conf. on Theory of Information Retrieval</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>216</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Vigdor</surname>
          </string-name>
          ,
          <article-title>Man fatally poisons himself while self-medicating for coronavirus, doctor says</article-title>
          ,
          <year>2020</year>
          . URL: https://www.nytimes.com/
          <year>2020</year>
          /03/24/us/chloroquine-poisoning-coronavirus. html,
          <source>[accessed June 9</source>
          ,
          <year>2022</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maistro</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Smucker, Overview of the trec 2021 health misinformation track</article-title>
          ,
          <source>in: Proceedings of the Thirtieth Text REtrieval Conference</source>
          , TREC,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maistro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Smucker</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zuccon, Overview of the trec 2020 health misinformation track</article-title>
          ,
          <source>in: Proceedings of the Twenty-Nine Text REtrieval Conference</source>
          , TREC,
          <year>2020</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino, T. Elsayed,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          , et al.,
          <article-title>The clef-2021 checkthat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>639</fpage>
          -
          <lpage>649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kąkol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jankowski-Lorek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Abramczuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wierzbicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Catasta</surname>
          </string-name>
          ,
          <article-title>On the subjectivity and bias of web content credibility evaluations</article-title>
          ,
          <source>in: Proceedings of the 22nd international conference on world wide web</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1131</fpage>
          -
          <lpage>1136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Fogg</surname>
          </string-name>
          ,
          <article-title>Persuasive technology: using computers to change what we think and do</article-title>
          ,
          <source>Ubiquity</source>
          <year>2002</year>
          (
          <year>2002</year>
          )
          <article-title>2</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>