A user study on people’s perception to the credibility
                                of online health information
                                Marcos Fernández-Pichel1,∗ , Markus Bink2 , David E. Losada1 and David Elsweiler2
                                1
                                  Centro de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de
                                Compostela, Spain
                                2
                                  Chair of Information Science, Universität Regensburg, Regensburg, Germany


                                            Abstract
                                            Judging the credibility of information is a subjective process and prone to biases. This issue can be
                                            especially concerning in health information seeking. Some efforts have been made to define robust
                                            credibility assessment guidelines that support the development of reliable test collections. This is of
                                            the utmost importance since the applicability of retrieval algorithms to real use case scenarios relies
                                            on the quality of the labelled data. Yet, the question persists as to whether the labels created by these
                                            guidelines can effectively serve as a surrogate for the genuine judgements of credibility as perceived by
                                            end-users. Motivated by this, we conducted a user study with 1,000 participants. We demonstrate that
                                            there is a correlation between participants’ judgements and the reference values produced following
                                            existing guidelines. Further analyses of the data reveal worrying insights into people’s ability to judge
                                            the credibility of online medical content, leading to potential personal harm.

                                            Keywords
                                            Health-related content, Credibility, User study


                                1. Introduction
                                The Internet has become the dominant platform for accessing health information, offering
                                convenient access to a wealth of medical knowledge [1, 2, 3]. Nonetheless, the abundance
                                of information poses a challenge for users in discerning trustworthy sources from unreliable
                                ones, potentially resulting in ill-informed choices regarding their health [4, 5, 6, 7]. In extreme
                                cases, this situation can have severe consequences and even poses a risk to personal well-
                                being [8]. Credibility has been defined as the extent to which information from a webpage or
                                other online source can be believed [9]. It is a highly subjective concept that is susceptible to
                                individual differences, such as user’s reading skills [10, 11]. The subjective nature of credibility
                                represents a barrier in creating reliable and robust test collections. In the context of shared-task
                                evaluation campaigns, some researchers have critically analysed the quality of the credibility
                                assessments and proposed a set of robust and traceable guidelines to improve the robustness of

                                ROMCIR 2024: The 4th Workshop on Reducing Online Misinformation through Credible Information Retrieval (held as
                                part of ECIR 2024: the 46th European Conference on Information Retrieval), March 24, 2024, Glasgow, UK
                                ∗
                                    Corresponding author.
                                Envelope-Open marcosfernandez.pichel@usc.es (M. Fernández-Pichel); markus.bink@ur.de (M. Bink); david.losada@usc.es
                                (D. E. Losada); david.elsweiler@ur.de (D. Elsweiler)
                                Orcid 0000-0002-6560-9832 (M. Fernández-Pichel); 0000-0002-3444-0990 (M. Bink); 0000-0001-8823-7501 (D. E. Losada);
                                0000-0002-5791-0641 (D. Elsweiler)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
annotations [12]. This is important since the applicability of retrieval and machine learning
algorithm relies on the quality of the annotation process.
   Nevertheless, there is still a need for a rigorous examination of the relationship between this
type of guidelines and credibility as real end-users perceive it. While we know that judgements
vary across users, annotations need to be both consistent and reflective of average users’
perceptions. Previous research has studied the main elements influencing individual credibility
perceptions [13, 14]. However, no user-oriented study has attempted to understand annotation
practices for shared-tasks. Such a study could also provide valuable cues on how people evaluate
the credibility of websites posting medical information.
   In this work, we perform a study to understand how end-users perceive the credibility of
online health information1 . The ultimate goal being to determine whether the labels created
from guidelines can serve as surrogate of credibility perceived by end-users. We attempt to
answer the following research questions:

       • RQ1. Can current credibility annotation guidelines act as a proxy of the real perception
         of credibility of end-users?
       • RQ2. To what extent users are able to recreate the judgements of experts?
       • RQ3. How do user variables, such as familiarity with the search topic, educational
         background, and other human factors, affect the user’s perception of credibility?


2. Related work
The credibility of online information and the spread of misinformation have been extensively
studied [15, 16, 17, 18]. Viviani and Pasi reviewed the main automatic methods to estimate
credibility in social media, focusing mainly on health content [19]. A further body of work has
sought to understand how end-users assess the credibility of online content and why people
make certain assessments. For instance, Fogg defined the prominence-interpretation theory,
which helps to determine which website elements influence end-users’ credibility [13]. This
theory was later tested through a user study involving 2,500 participants, where authors found
that 46% of the users mentioned design as a critical aspect influencing credibility [20].
   Easting et al. [21] demonstrated that both the source and the prior knowledge about the
content have influence on users’ perception of online health information. Other studies have
also demonstrated that, apart from the characteristics of the web elements, the receiver’s
characteristics also influence the perception of the information [22]. Other researchers analysed
in-depth the factors that influence end-users’ perceived credibility [15]. Previous studies have
also evaluated the correlation between different users’ judgements to test their feasibility as
ground truth values [23, 24].
   In this paper, we present a systematic user study that shows how well expert annotations
reflect the subjective judgements of a broad population of users. We also evaluate a number of
personal factors that may influence the credibility estimations.


1
    https://github.com/MarcosFP97/perceived-credibility-study
       Topic id                           Question                              Number of docs.
         T1               Do antioxidants help female subfertility?                  41
         T5          Do sealants prevent dental decay in permanent teeth?            45
         T8             Does melatonin help treat and prevent jetlag?                39
         T10                  Does traction help low back pain?                      37
Table 1
Health topics in the user study


3. Experimental setup
We hypothesise that reference values (created by expert annotators using formal guidelines)
will correlate with users’ judgements. To test this, we conducted a crowd-sourced user study
whereby participants provided credibility judgements for webpages.

3.1. Dataset
We utilised a pre-existing dataset from the medical domain that had been originally compiled
by Pogacar et al. [4] and later extended by Zimmerman et al. [25]. We extracted 162 screenshots
of webpages from it, such that the selected webpages provide answers spanning four distinct
health-related topics, as detailed in Table 12 . Each participant was presented with a full-scale
screenshot of a randomly selected webpage and was asked to assess the webpage’s credibility
on a 7-point Likert-scale. Each webpage was evaluated at least 5 times by different participants
to minimise personal bias [18, 10].
   For these webpages, we also produced annotations generated by human assessors according
to the guidelines from [12], detailed in Table 2. We recruited four different assessors3 . For a
given topic, the webpages were annotated by the same pair of assessors. Next, the two assessors
responsible for each topic convened a meeting to discuss and consolidate their annotations and
generate a final set of labels. These final annotations were used as reference values in our user
study.

3.2. Variables
3.2.1. Independent variables
    • Reference values: variable indicating the credibility-level perceived by the human
      annotators according to the guidelines. There are three possible levels: 0 (non-credible),
      1 (credible), and 2 (highly credible).

3.2.2. Dependent variables
    • User credibility score: the credibility score assigned by the crowdsourcers in a 7-point
      Likert scale.
2
We did not use the raw HTML, since we consider visual elements as key to the perception of credibility.
3
Three PhD students with background in British Studies, Computer Science and Information Science, respectively,
and a Master’s Degree student in Information Science
        Label   Guideline
  G1      2     Source is a scientific paper, or a Medical publisher or hospital/clinic or government
                website or university.
 G2       1     Document is citing the information they provide in their articles. They provide links
                or specific references to their sources. They cite sources with credibility 2 (i.e. medical
                publications and/or lab studies).
 G3       1     Document is written by an expert in the field/someone qualified to write this docu-
                ment (irrespective of publishing venue).
 G4       0     The document is actually for advertising or marketing purposes. If so, the website
                might be biased or a scam designed to trick people into fake treatments or into
                buying medical products that do not live up to their claim.
 G5       0     The information posted by a non-expert person providing a medical product review
                or providing medical advice without proper citations (links/list of references).
 G6       0     The website provides or states claims that go against well-known medical consensus
                (e.g. smoking cigarettes does not cause cancer).
 NOTE: It is generally allowed to look up authors to check whether they have the required knowledge to
 be regarded as an expert and look up websites to find out if they are legitimate.

Table 2
Guidelines proposed in Fernández-Pichel et al. [12].


Figure 1: Pre-Task Questionnaire (left), which prompted the participant to give their self-assessed level
of experience on the presented topic. Post-Study Questionnaire (right), which prompted the participant
to enter demographic information such as age, gender, educational background and current occupation.


    • Time of completion (in minutes): variable representing the total amount of time it
      took for a crowdsourcer to complete the assessment (measured from the moment the
      screenshot of the website was shown until successful completion).

3.2.3. Descriptive and exploratory variables
We also studied some variables that can influence or have some connection with the credibility
scores gathered in the study (this relation is further explored in Section 4):

    • Topic familiarity: in the pre-task questionnaire, see Figure 1 (left side), participants
      were asked about their prior knowledge on the topic.
Figure 2: Main-Study Screen, which presented participants with a screenshot of the assigned webpage.
The bottom of the page presented a Likert scale to rate the perceived credibility of the website and a
(non-mandatory) free-text input field to justify the judgement.


    • Personal data: in a post-study questionnaire, see Figure 1 (right side), we gathered
      additional information about the participants in the study (educational background,
      gender, and age).
    • Justifications: we also provided the participants with the possibility of justifying their
      rating in their own words (free text field).

3.3. Procedure
Once participants had been presented with the goals of the study, its methodology, and the
implications of their involvement, they provided their permission by signing a consent form.
Next, they proceeded with 3 steps to satisfactorily complete the study:
   1. Each participant was randomly assigned one webpage from the collection. Before seeing
      the webpage’s screenshot, they needed to fulfil a pre-task questionnaire about their
      expertise on the topic of the website, see Figure 1 (left side).
   2. Subjects were shown a screenshot of the entire webpage and they needed to assess its
      credibility in a 7-point Likert scale (from not credible at all to very credible), see Figure
      2. They had no time limit to provide this estimate, and they could scroll through the
      entire screenshot and provide a free-text justification about their judgement (this step
      was not mandatory, however a high number of participants provided this feedback, see
      Section 4.6).
   3. Before ending the study, participants were shown a post-study questionnaire, see Figure 1
      (right side). Our main goal was to gather additional data, such as educational background,
      age, and gender.
Figure 3: Boxplot of the crowdsourced assessments for the three types of webpages presented (non-
credible, credible and highly credible).


3.4. Participants
We recruited a total of 1,000 users to guarantee at least 5 judgements per webpage. We used
the Prolific 4 platform and each participant received £0.32 (equivalent to £9.60 per hour). Our
participants were fluent English speakers, resident in the United States or the United Kingdom.
The annotators belonged to an age range between 18 and 85 years old. 53% identified as female,
45% as male, and the remaining participants identified either as diverse or other. In terms of
educational background, 40% had a bachelor’s degree, 37% completed secondary education, and
only 2.6% of the participants reported a level of education below high school.


4. Results
4.1. Reference credibility values vs users’ judgements
RQ1 seeks to determine whether the current annotation guidelines, whose goal is to produce ro-
bust assessments of credibility for medical websites, serve as a proxy for the human’s perception
of credibility [12]. We analysed the distribution of the crowdsourced judgments according to the
three levels of reference values. As can be seen in Figure 3, it seems that there is a relationship
between both variables. The participants’ judgements tend to be higher when presented with
webpages of increasing credibility (according to experts). This is confirmed by a Spearman’s
rank correlation (𝜌 = 0.26 and a 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.01), indicating weak agreement according to [26].
   Despite this correlation, there are some signs of concern regarding RQ2. Webpages annotated
as non-credible according to the guidelines were often perceived as reliable by the participants.
This can be observed by the fact that non-credible documents have very high perception
scores and their median score is 5. This confirms previous research findings that people
tend to overestimate credibility and have problems identifying low-quality sites [27, 28]. In
general, webpages labelled as credible or highly credible by the reference judgements were

4
    https://www.prolific.co/
Figure 4: In the left plot, we can find crowdsourced credibility values grouped by familiarity with the
topic. The right plot represents the deviations between the crowdsourced credibility values and the
reference values (also grouped by familiarity). Positive (negative) values represent cases where users
overestimated (underestimated) the credibility of the webpage.


also considered as of high quality by the crowdworkers. However, people struggled to detect
contents that are regarded as low quality by the reference annotations.
   Summing up, regarding RQ1, we can conclude that the participants’ judgements and the
reference values derived from the guidelines are correlated. As for RQ2, we found out that
crowdworkers are less prone to errors when evaluating high quality pages, but they struggled
for pages with lower levels of credibility.

4.2. Topic familiarity
Prior to completing the study and to partially answer RQ3, we asked participants about their
level of knowledge or familiarity with their assigned topic in a 5-point Likert scale (ranging
from Not at all familiar to Extremely familiar, see Figure 1 (left side)).
   Figure 4 (left side) shows the relation between levels of familiarity and the participants’
judgements. It seems that the higher the familiarity, the higher the credibility judgements
provided by participants. Spearman’s correlation yielded a 𝜌 = 0.10 and 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.01,
demonstrating that there is a very weak correlation between the two variables [26].
   To further explore the user study data, we also computed the deviation per level of familiarity
between the reference values and the user study’s judgements. First, we applied a Min-Max
normalisation to both sets of scores. Then, the difference between the crowdsourcing judgements
and the reference values was computed –0 represents a perfect match, while a positive (negative)
value means that people overestimated (underestimated) credibility– see Figure 4 (right side).
From the figure, we might conclude that there is not a strong relation between familiarity and
how effective are users at rating webpages. However, Spearman’s correlation yielded a 𝜌 = 0.13
and 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.01.
   As a complementary analysis, we also computed the mean familiarity per topic (and its
standard deviation): T1 has a mean familiarity of 1.55 (0.89), T5 of 1.79 (1.07), T8 of 2.44 (1.25),
and T10 of 2.14 (1.15).

4.3. Topic analysis


Figure 5: In the left plot, we show boxplots (per topic) of the crowdsourced assessments for the three
types of webpages presented (non-credible, credible and highly credible). The right plot shows the
deviations between the user credibility values and the reference values.


   Figure 5 (left side) reports a topic-level analysis. As can be expected, there are individual
differences among the topics. Spearman’s test also revealed statistically significant correlations
between reference credibility scores and crowdsourced credibility scores for all topics. However,
the correlation for T1 was lower. These results fit with the familiarity scores described above,
where T1 was shown to be the topic that users had less knowledge about.
   Again, we also computed the deviation per topic between the reference values and the user
study’s judgements, see Figure 5 (right side). Spearman’s test revealed an statistically significant
correlation between both variables (𝜌 = 0.20 and 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.01). An interesting finding is
that for the topics users are more familiar with, T8 and T10, they tend to overestimate their
perceived credibility.

4.4. Other user variables
To fully answer RQ3, the relation between additional user variables (gender, age, and educational
background) and their perception of credibility was explored. For the first two variables, no
significant correlations or revealing trends were found. However, for the educational background
Figure 6: Box plots (per educational background) representing the deviations between the user credibility
values and the reference values. Positive (negative) values represent cases where users overestimated
(underestimated) the credibility of the webpage.


(Figure 6), we found an interesting conclusion: all groups were equally good at estimating
credibility, except for the less educated (less than high school group) and the most educated
(doctorate group). The Spearman’s test reported a 𝜌 = 0.05 and 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.12 rejecting
the hypothesis that there is a correlation between the educational level and the quality of the
assessments, measured by the deviation between the crowdsourced and reference values.

4.5. Time of completion
We also analysed the time (in minutes) users needed to complete the assessment. Figure 7 shows
that users who spent less time analysing the web (between 0-6 minutes) tended to deviate less
from the reference values (deviation close to 0). We speculate that “overthinking” might be
counterproductive for this task. Alternatively, the lower quality of the estimates at the right
end of the graph could be due to other factors such as distractions. Related to this, previous
studies showed that people who take more time on this type of tasks tend to be more influenced
by visual elements and their prior knowledge [29]. In any case, the Spearman’s correlation test
revealed no statistical significance (with a 𝜌 = 0.009 and a 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.78) between completion
time and deviation between crowdsourced and reference judgements.

4.6. Analysis of justifications
We offered users the possibility of justifying their judgements. This was actively used by
participants, with 93% providing a textual explanation. This gave us valuable evidence to analyse
the reasons behind credibility judgements. Yet, manual inspection was infeasible because we
had thousands of datapoints. We therefore opted for exploiting the summarisation capabilities
Figure 7: Box plots (per completion time, in minutes) between the user credibility values and the
reference values. Positive (negative) values represent cases where users overestimated (underestimated)
the credibility of the webpage.


of current Large Language Models (LLMs). To that end, the justifications were grouped by the
different levels of perceived credibility and GPT-4 was provided with these textual extracts
and asked to generate a summary for each level. The template used for prompting the LLM
was as follows: “We are a group of scientists that have conducted an online survey on webpage
credibility. For each webpage, we asked a human assessor to provide a score credibility from 1
to 7 (very low credible to very credible). Assessors could also provide a justification on why they
assigned a given credibility score. Given a series of justifications between <>, I want you to generate
an understandable summary. <justifications>. The summary is:”. The resulting summaries are
presented in Table 3.
   Some interesting patterns were observed in the provided explanations. Low credibility was
usually associated with a poor visual appearance of the webpage. Moreover, several users
mentioned the lack of reputable references as a critical aspect to mistrust information. This
suggests the existence of a direct relationship between the real perception of end-users and
the existing guidelines [12], which incorporate criteria such as the presence of authoritative
references. Regarding cases of high perception of credibility, users tend to trust professional
appearance and well-structured content. They also mentioned the importance of the qualifica-
tions of the authors or the inclusion of credible sources. These two criteria were also considered
in the original assessment guidelines (citing reputable sources or the expertise of the author).
Crowdsourcers mentioned governmental sites, e.g. NHS or CDC, as the highest trustful sources.
This also matches with the criteria stated in the guidelines. This qualitative analysis provides
tangible evidence to support current guidelines as proxies of users’ perceptions of credibility.
 Perceived Credibility Level    Summary of the Justifications
      1 (low credibility)       Many participants cited unprofessional design, poor layout, and
                                lack of reliable sources as reasons for low credibility scores. Ad-
                                ditionally, some expressed distrust towards Wikipedia and belief
                                that the information could be biased or incorrect.
               2                The justifications mainly indicate that the webpages appear ama-
                                teur, unprofessional, and lacking credible sources or evidence.
                                Additionally, many assessors express skepticism due to outdated
                                information, poor design, and potential bias from advertise-
                                ments or affiliate links.
               3                The webpages were considered less credible due to factors such as
                                unprofessional design, outdated information, informal lan-
                                guage, lack of evidence or citations, and clickbait-like content.
                                The credibility of some pages was questioned due to unfamiliar
                                brands, personal opinions, and the absence of proper qualifica-
                                tions or accreditations.
               4                Assessors found some webpages to be credible due to professional
                                appearance, links to studies or association with reputable orga-
                                nizations, while others were seen as less credible due to informal
                                language, lack of citations or references, and potential for errors.
                                The credibility of some pages was difficult to judge without further
                                investigation or knowledge of the subject matter.
               5                The justifications highlight the presence of credible sources, pro-
                                fessional appearance, and author qualifications as positive
                                factors for credibility. However, some concerns are raised due to
                                missing citations, outdated information, and potential biases.

               6                Survey participants found the webpages credible due to their pro-
                                fessional appearance, use of medical facts and references, rep-
                                utable sources, well-structured content, and qualified authors.
                                The credibility was also often influenced by personal experiences
                                or previous knowledge about the subjects discussed.
      7 (high credibility)      The majority of the justifications indicate that the webpages are
                                credible due to their professional appearance, reputable sources, and
                                being associated with trusted organizations such as the NHS, CDC,
                                and various academic journals. Additionally, assessors mentioned
                                the presence of scientific research, citations, author credentials,
                                and detailed information as contributors to the credibility of the
                                webpages.

Table 3
Summaries generated with LLMs of the justifications provided by the crowdsourcers. We have high-
lighted some key words that served as justification for user judgements


5. Discussion
In this study, we showed that there is correlation between the annotations produced from
existing credibility guidelines and the end-user’s perception of credibility. This highlights the
value of existing guidelines [12] as proxies of credibility, thus endorsing these guidelines as a
roadmap in the complex and subjective task of credibility tagging.
   We also demonstrated that this relation is topic-dependent, as users tend to deviate more from
the reference ground truth when they are less familiar with the topic. Results also confirmed
previous research that suggests that people tend to overestimate credibility and, often, struggle
to identify sites labelled as low-quality. We also studied the influence of other variables, such
as the time of completion or the educational background, and some interesting conclusions
arose. Regardless of the educational background, people have difficulties judging the credibility.
This even happens with individuals that have a strong educational background (for example,
graduated students who have often been trained in skills such as critical thinking). It also
appears that “overthinking” and spending too much time to emit a judgement does not lead to
better estimates of credibility. Text-free justifications were also inspected, confirming a direct
relationship between certain elements from the credibility guidelines and user’s perceptions.


6. Conclusions
In this paper, we have conducted a user study on people’s perception to the credibility of online
health information. First, we used a previous study in the field to produce reference values
based on a series of guidelines. We found out a correlation between these values and the
judgements collected in the study. However, some worrying facts were also found: people
tend to overestimate the credibility of the sites (this can be specially damaging when health
information seeking) and it seems that the educational background has not a direct effect in
their perceptions. As future work, we want to differentiate between closely related concepts
such as credibility (more subjective) or correctness (more factual) and study how they affect
users judgements.


Acknowledgements
The authors thank: i) the financial support supplied by the Xunta de Galicia - Consellería de
Cultura, Educación, Formación Profesional e Universidades (Centro de investigación de Galicia
accreditation 2019-2022 ED431G-2019/04 and Reference Competitive Group accreditation 2021-
2024, ED431C 2022/19) and the European Union (European Regional Development Fund - ERDF)
and ii) the financial support supplied by project PID2022-137061OB-C22 (Ministerio de Ciencia
e Innovación, Agencia Estatal de Investigación, Proyectos de Generación de Conocimiento;
supported by the European Regional Development Fund).
   The third author thanks the financial support obtained from project SUBV23/00002 (Ministerio
de Consumo, Subdirección General de Regulación del Juego).
   The     authors      also    thank     the    funding     of    project   PLEC2021-007662
(MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Es-
tatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next
Generation EU).
References
 [1] S. Shepperd, D. Charnock, B. Gann, Helping patients access high quality health information,
     Bmj 319 (1999) 764–766.
 [2] R. J. Cline, K. M. Haynes, Consumer health information seeking on the internet: the state
     of the art, Health education research 16 (2001) 671–692.
 [3] S. Fox, Health topics: 80% of internet users look for health information online, Pew Internet
     & American Life Project, 2011.
 [4] F. A. Pogacar, A. Ghenai, M. D. Smucker, C. L. Clarke, The positive and negative influence
     of search results on people’s decisions about the efficacy of medical treatments, in:
     Proceedings of the ACM SIGIR Int. Conf. on Theory of Information Retrieval, 2017, pp.
     209–216.
 [5] G. Eysenbach, Infodemiology: The epidemiology of (mis) information, The American
     Journal of Medicine 113 (2002) 763–765.
 [6] G. Eysenbach, J. Powell, O. Kuss, E.-R. Sa, Empirical studies assessing the quality of health
     information for consumers on the world wide web: a systematic review, Jama 287 (2002)
     2691–2700.
 [7] E. V. Bernstam, D. M. Shelton, M. Walji, F. Meric-Bernstam, Instruments to assess the
     quality of health information on the world wide web: what can our patients actually use?,
     International journal of medical informatics 74 (2005) 13–19.
 [8] N. Vigdor, Man fatally poisons himself while self-medicating for coronavirus, doctor says,
     2020. URL: https://www.nytimes.com/2020/03/24/us/chloroquine-poisoning-coronavirus.
     html, [accessed June 9, 2022].
 [9] B. J. Fogg, Persuasive technologie301398, Communications of the ACM 42 (1999) 26–29.
[10] C. Hahnel, F. Goldhammer, U. Kröhne, J. Naumann, The role of reading skills in the
     evaluation of online information gathered from search engine environments, Computers
     in Human Behavior 78 (2018) 223–234.
[11] M. Kąkol, M. Jankowski-Lorek, K. Abramczuk, A. Wierzbicki, M. Catasta, On the sub-
     jectivity and bias of web content credibility evaluations, in: Proceedings of the 22nd
     international conference on world wide web, 2013, pp. 1131–1136.
[12] M. Fernández-Pichel, S. Meyer, M. Bink, A. Frummet, D. E. Losada, D. Elsweiler, Improving
     the reliability of health information credibility assessments, in: Proceedings of the 3rd
     Workshop on Reducing Online Misinformation through Credible Information Retrieval
     2023 co-located with The 45th European Conference on Information Retrieval (ECIR 2023),
     2023, pp. 43–50. URL: https://ceur-ws.org/Vol-3406/paper4_jot.pdf.
[13] B. J. Fogg, Prominence-interpretation theory: Explaining how people assess credibility
     online, in: CHI’03 extended abstracts on human factors in computing systems, 2003, pp.
     722–723.
[14] J. Unkel, A. Haas, The effects of credibility cues on the selection of search engine results,
     Journal of the Association for Information Science and Technology 68 (2017) 1850–1862.
[15] S. M. Shariff, A review on credibility perception of online information, in: 2020 14th
     International Conference on Ubiquitous Information Management and Communication
     (IMCOM), IEEE, 2020, pp. 1–7.
[16] A. Bodaghi, K. A. Schmitt, P. Watine, B. C. Fung, A literature review on detecting, verifying,
     and mitigating online misinformation, IEEE Transactions on Computational Social Systems
     (2023).
[17] A. L. Ginsca, A. Popescu, M. Lupu, et al., Credibility in information retrieval, Foundations
     and Trends in Information Retrieval 9 (2015) 355–475.
[18] D. H. McKnight, C. J. Kacmar, Factors and effects of information credibility, in: Proceedings
     of the ninth international conference on Electronic commerce, 2007, pp. 423–432.
[19] M. Viviani, G. Pasi, Credibility in social media: opinions, news, and health information—a
     survey, Wiley interdisciplinary reviews: Data mining and knowledge discovery 7 (2017)
     e1209.
[20] B. J. Fogg, C. Soohoo, D. R. Danielson, L. Marable, J. Stanford, E. R. Tauber, How do users
     evaluate the credibility of web sites? a study with over 2,500 participants, in: Proceedings
     of the 2003 conference on Designing for user experiences, 2003, pp. 1–15.
[21] M. S. Eastin, Credibility assessments of online health information: The effects of source
     expertise and knowledge of content, Journal of Computer-Mediated Communication 6
     (2001) JCMC643.
[22] C. N. Wathen, J. Burkell, Believe it or not: Factors influencing credibility on the web,
     Journal of the American society for information science and technology 53 (2002) 134–144.
[23] S. Sikdar, B. Kang, J. ODonovan, T. Höllerer, S. Adah, Understanding information credibility
     on twitter, in: 2013 International Conference on Social Computing, IEEE, 2013, pp. 19–24.
[24] S. K. Sikdar, B. Kang, J. O’Donovan, T. Hollerer, S. Adal, Cutting through the noise:
     Defining ground truth in information credibility on twitter, Human 2 (2013) 151–167.
[25] S. Zimmerman, A. Thorpe, C. Fox, U. Kruschwitz, Privacy nudging in search: Investigat-
     ing potential impacts, in: Proceedings of the 2019 Conference on Human Information
     Interaction and Retrieval, 2019, pp. 283–287.
[26] N. S. Chok, Pearson’s versus Spearman’s and Kendall’s correlation coefficients for continu-
     ous data, Ph.D. thesis, University of Pittsburgh, 2010.
[27] J. Schwarz, M. Morris, Augmenting web pages and search results to support credibility
     assessment, in: Proceedings of the SIGCHI conference on human factors in computing
     systems, 2011, pp. 1245–1254.
[28] E. R. Carlson, Evaluating the credibility of sources: A missing link in the teaching of
     critical thinking, Teaching of Psychology 22 (1995) 39–41.
[29] M. Kattenbeck, D. Elsweiler, Understanding credibility judgements for web search snippets,
     Aslib Journal of Information Management (2019).