<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A. Matviychuk);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Leveraging Artificial Intelligence and Large Language Models for Fake Content Detection in Digital Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andriy Matviychuk</string-name>
          <email>matviychuk@kneu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl Derbentsev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vitalii Bezkorovainyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tetiana Kmytiuk</string-name>
          <email>kmytiuk.tetiana@kneu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksii Hostryk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kyiv National Economic University named after Vadym Hetman</institution>
          ,
          <addr-line>Beresteysky Ave. 54/1, 03057 Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Odesa National Economic University</institution>
          ,
          <addr-line>Preobrazhenskaya Str. 8, 65082 Odesa</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The rapid proliferation of misinformation and fake news across online platforms has become a significant challenge, necessitating the development of advanced detection methods. This study explores the application of BERT-based models, including RoBERTa, DistilBERT, and XLM-RoBERTa, for the identification of fake news. Using diverse datasets (WELLFake, and PolitiFact) our approach involves finetuning these pre-trained models with minimal text preprocessing to preserve linguistic nuances. The models were evaluated based on their accuracy, F1-score, and computational efficiency, with experiments conducted on Google Colab using NVIDIA GPUs for acceleration. RoBERTa demonstrated the highest accuracy on the WELLFake dataset, while DistilBERT achieved the best performance on the more concise PolitiFact dataset, highlighting the importance of matching models to dataset characteristics. XLMRoBERTa, with its multilingual capabilities, showed strong generalization on diverse data but faced challenges with domain-specific tasks. The results underscore that model selection should be tailored to the specifics of the dataset and available computational resources, offering valuable insights for deploying effective fake news detection systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;fake news detection</kwd>
        <kwd>text classification</kwd>
        <kwd>risk information</kwd>
        <kwd>artificial intelligence</kwd>
        <kwd>deep learning</kwd>
        <kwd>natural language processing</kwd>
        <kwd>BERT-like models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The proliferation of fake content in electronic media has become a critical challenge in our
increasingly digitized world. From misinformation and disinformation to sophisticated deepfakes,
the spread of false or misleading content poses significant threats to social cohesion, democratic
processes, and individual decision-making. As the volume and complexity of digital content continue
to grow exponentially, traditional methods of fact-checking and content verification struggle to keep
pace, necessitating the development of more advanced, automated approaches to identifying fake
content [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        In recent years, the field of Natural Language Processing (NLP) has witnessed remarkable
advancements, particularly in the domain of Large Language Models (LLMs). These sophisticated
Artificial Intelligence (AI) systems, trained on vast corpora of text data, have demonstrated an
unprecedented ability to understand and generate human-like text, making them promising
candidates for tackling the fake content detection challenge. Models like CNN (Convolutional Neural
trained Transformer), and their variants have set new benchmarks in various NLP tasks, including
text classification, sentiment analysis, and question answering [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3-6</xref>
        ].
      </p>
      <p>
        The potential of LLMs in identifying fake content lies in their capacity to capture subtle linguistic
patterns, contextual nuances, and semantic relationships that may be indicative of fabricated or
misleading information. By leveraging pre-trained models and fine-tuning them on specific datasets
related to fake content, researchers and developers can create powerful tools for automated content
authenticity verification [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This article explores the application of large language models, with a
focus on BERT and its variations, in the detection of fake content across various electronic media
platforms. We will delve into the process of adapting these pre-trained models to the specific task of
fake content identification, discussing the methodology of retraining several last layers on custom
datasets chosen for this purpose.
      </p>
      <p>
        The approach of fine-tuning pre-trained models offers several advantages in the context of fake
content detection. Firstly, it allows us to benefit from the rich language understanding already
encoded in these models, which have been trained on diverse and extensive datasets. Secondly, it
provides a more efficient and resource-effective method compared to training models from scratch,
which is especially valuable when working with limited labelled data specific to fake content [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        However, the application of LLMs in this domain is not without challenges. Issues such as model
bias, the need for continual updating to keep pace with evolving disinformation tactics, and the
ethical implications of automated content analysis must be carefully considered [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Moreover, the
effectiveness of these models can vary depending on the type and source of fake content,
necessitating a nuanced approach to model selection and fine-tuning.
      </p>
      <p>Throughout this article, we will examine the architecture of the chosen pre-trained models, detail
the process of dataset preparation and model fine-tuning, and present a comprehensive analysis of
the results obtained. We will also discuss real-world applications, limitations of the current approach,
and potential future developments in this rapidly evolving field.</p>
      <p>By exploring the use of LLM and the detection of fake content, this study contributes to broader
efforts to combat disinformation in the digital age. In an increasingly complex information
environment, developing advanced AI-based tools for content verification is not only a technical
challenge, but also an important measure to ensure the reliability and credibility of information.</p>
      <p>It should be noted that
illustration of the power and danger of fake content in modern warfare and international relations.
The onset of the full-scale phase of the hostilities has been marked by unprecedented levels of
information warfare, with a flow of fakes and disinformation flooding social media platforms, news
feeds, and messaging apps.</p>
      <p>The dissemination of fake content in this context ranges from fabricated stories about Ukrainian
aggression to doctored videos purporting to show military actions that never occurred. This barrage
of false information has not only complicated the internati
situation, but has also affected public opinion, potentially influencing policy decisions and
humanitarian aid efforts.</p>
      <p>The situation highlights the critical need for reliable, rapid detection systems for fake content, as
the consequences of unverified disinformation in such high-stakes geopolitical scenarios can be
severe and far-reaching.</p>
      <p>The objective of our study is to develop a set of fake news identification models based on
pretrained BERT models by fine-tuning the last few layers, and compare their performance.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review</title>
      <p>
        With advancements in Machine Learning (ML), Deep Learning (DL), and LLMs, researchers have
developed various methods to identify false information accurately. Recent studies in this domain
have explored different approaches, ranging from traditional machine learning models to advanced
neural networks, hybrid models, and even explainable AI. Recently have been published several
76
overview of ML and DL approaches in the field of identification of fake content in the digital media
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">10-13</xref>
        ].
      </p>
      <p>
        Harris et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] explore the emergence of information pollution and the infodemic resulting from
the widespread use of digital technologies on online social networks, blogs, and websites. They
highlight the negative consequences of the malicious broadcast of misleading content, including
social unrest, economic impacts, and threats to national security and user safety. The authors
critically evaluate existing fake news detection (FND) methods, emphasizing the lack of
multidisciplinary approaches and theoretical considerations in current research. They argue for a
more comprehensive analysis of FND through various fields such as linguistics, healthcare, and
communication, while also examining the potential of pre-trained transformer models for
multilingual, multidomain, and multimodal FND. The authors suggest future research directions that
focus on large, diverse datasets and the integration of human cognitive abilities with AI to combat
fake news and AI-generated content.
      </p>
      <p>
        Hu et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] provide a comprehensive overview of fake news detection by analyzing its diffusion
process through three intrinsic characteristics: intentional creation, heteromorphic transmission,
and controversial reception. The authors classify existing detection approaches based on these
characteristics and discuss the technological trends that are shaping this research field. They
highlight the importance of designing effective and explainable detection mechanisms and offer
insights into future research directions, helping to advance the understanding and development of
fake news detection strategies.
      </p>
      <p>
        Alghamdi et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] present a comparative study of different approaches to fake news detection.
The authors evaluate the performance of traditional ML methods such as Support Vector Machines
(SVMs) and Random Forests (RFs) alongside more advanced DL models like CNN and Long
ShortTerm Memory (LSTM) network. Their research highlights the superiority of deep learning models,
particularly LSTM, in capturing the sequential nature of text data and achieving higher accuracy in
fake news detection. The study also emphasizes the importance of feature selection and engineering
in improving model performance, suggesting that a combination of content-based and metadata
features can lead to more robust detection systems.
      </p>
      <p>
        Hamed et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] offer a comprehensive review of fake news detection approaches, focusing on
the challenges associated with datasets, feature representation, and data fusion. The authors
critically analyze existing studies, highlighting the limitations of current datasets, which often lack
diversity and real-world applicability. They discuss various feature representation techniques, from
traditional bag-of-words models to more sophisticated word embeddings and contextual
representations. The paper also explores the potential of multi-modal approaches that combine
textual, visual, and social context information for more accurate fake news detection. The authors
conclude by identifying key research gaps and suggesting future directions, including the need for
more robust and diverse datasets, improved feature extraction methods, and the integration of
explainable AI techniques to enhance the interpretability of FND models.
      </p>
      <p>
        For example, in the article [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the explainability of decision-making in the field of text analysis
is ensured by the use of semiotic AI tools, namely fuzzy logic. In the article [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], explainable AI was
implemented based on an artificial neural network, which provided the rationale for the formation
of logical inference. Additional advantages in the interpretability of artificial intelligence can be
provided by combining both such approaches, based on semiotic and biological principles of
constructing AI systems and implemented in neuro-fuzzy hybrid systems, as shown in [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ].
      </p>
      <p>
        It is also worth noting the studies that have focused on comparing traditional ML and DL
approaches for fake news detection. Thus, authors of the review [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] compare the performance of
such ML algorithms as Naïve Bayes, Logistic Regression, SVM, and RNNs. They noted that SVM and
Naïve Bayes outperform the other models in terms of classification efficiency. This approach
addresses the growing issue of misinformation on social media, where users often perceive content
as reliable without verification.
      </p>
      <p>
        In contrast, DL techniques have gained attention for their ability to automatically extract features
from text data. Nasir et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] employed CNNs and RNNs for fake news detection, showing that
CNNs excel at capturing local patterns in text, while RNNs, particularly LSTM models, are better at
understanding sequential information. The combination of these two models led to superior results.
      </p>
      <p>
        Tipper et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] provide a comprehensive review of video deepfake detection techniques using
hybrid CNN-LSTM models. The paper systematically investigates feature extraction approaches and
widely used datasets, while evaluating model performance across various datasets and identifying
factors influencing detection accuracy. The authors here also compare CNN-LSTM models with
nonLSTM approaches, discuss implementation challenges, and propose future research directions for
improving deepfake detection.
      </p>
      <p>
        Paka et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] introduced Cross-SEAN, a semi-supervised neural attention model for detecting
COVID-19 fake news on Twitter, leveraging both labelled and unlabelled data. Their approach, which
incorporates external knowledge from trusted sources, achieved significant performance
improvement over seven state-of-the-art models. Despite some limitations, such as potential biases
in external knowledge, the model shows promising results, particularly with its real-time application
in the Chrome-SEAN extension, designed to label fake tweets and collect user feedback for
continuous improvement.
      </p>
      <p>
        Recent studies show that the use of LLMs such as GPT and BERT has led to more refined
approaches in fake news detection [
        <xref ref-type="bibr" rid="ref22 ref23 ref24 ref25">22-25</xref>
        ]. These models enhance the ability to understand the
context, semantics, and intricate relationships within news articles, which are essential for
distinguishing between truthful and deceptive content. By leveraging deep learning techniques,
LLMs have significantly improved the accuracy and effectiveness of fake news detection systems,
making them more robust in combating misinformation in the digital landscape.
      </p>
      <p>
        For instance, Radhi et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] examine the application of DL methods, including
transformerbased models like BERT, to detect fake news. Their research highlights the growing impact of
misleading content on social media platforms such as Facebook, Twitter, Instagram, and WhatsApp,
and emphasizes the urgency of addressing the problem of fake news, particularly in the context of
psychological warfare and revenue-driven clickbait.
      </p>
      <p>
        Kaliyar et al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] propose FakeBERT, a BERT-based model that combines BERT with a CNN to
handle ambiguity in news content. This model achieves a remarkable accuracy of 98.90%,
outperforming existing models by using bidirectional training to capture semantic and long-distance
dependencies, thus improving classification performance.
      </p>
      <p>
        Similarly, Alnabhan and Branco [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] present BERTGuard, a multi-domain fake news detection
system that employs a two-tiered approach for domain classification and domain-specific news
validity verification. This system demonstrates its effectiveness through rigorous testing on various
datasets and incorporates strategies to mitigate class imbalance, enhancing its reliability and
generalizability.
      </p>
      <p>
        Dhiman et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] propose a novel framework called GBERT, combining GPT and BERT to tackle
the problem of fake news detection. The model's high performance, achieving 95.30% accuracy and
a 96.23% F1 score, underscores its potential to address the challenges posed by fake news in the digital
era.
      </p>
      <p>Overall, these diverse approaches underline the evolving nature of research in fake news
detection. While traditional ML models still provide a foundation, the rapid advancements in deep
learning, LLMs, and hybrid methods have expanded the capabilities to combat disinformation. The
integration of explainable AI and adversarial training techniques ensures that these models remain
both transparent and robust, helping to build trust in automated fake news detection systems.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. BERT-like models</title>
        <p>
          In our study, the methodology focuses on leveraging a pre-trained BERT (introduced by Devlin et al.
in 2018 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]) and its modifications for fake news detection. BERT is a powerful transformer-based
model known for its deep bidirectional nature, which allows it to understand the context of words
in a text by looking both to the left and right of a given token.
        </p>
        <p>Due to its ability to encode rich semantic information from large text corpora, BERT has been a
popular choice for various NLP tasks, including text classification, sentiment analysis, and fake news
detection. Here, the goal is to fine-tune BERT for classifying news articles into "real" or "fake"
categories, aiming for accurate detection of misleading information.</p>
        <p>At its core, BERT utilises a multi-layer bidirectional Transformer encoder. This bidirectional
approach enables the model to consider context from both directions simultaneously, which is in
stark contrast to traditional left-to-right language models. The standard BERT model comprises
(BERT base) 12 transformer layers (encoders), 12 attention heads, and about 110 million parameters.
The larger variant (BERT large) has 24 layers, 16 attention heads, and 340 million parameters (Fig. 1).
• Residual connections and layer normalization: these are used to stabilize training and
improve gradient flow through the network.</p>
        <p>The input representation in BERT is a combination of three embeddings: token embeddings,
segment embeddings, and position embeddings. Token embeddings represent individual words or
subwords, segment embeddings differentiate between pairs of sentences, and position embeddings
provide information about the token's position in the sequence. These embeddings are summed to
produce the final input representation.</p>
        <p>BERT's transformer layers consist of multi-head self-attention mechanisms and feed-forward
neural networks. The self-attention mechanism allows the model to assign varying importance to
different words in the input when processing each word, capturing complex relationships within the
text. The feed-forward networks then refine this processed information, applying non-linear
transformations that enhance the model's capacity to recognize and learn complex patterns.</p>
        <p>BERT's pre-training process involves two novel unsupervised tasks. The first is Masked Language
Modeling (MLM), where the model attempts to predict randomly masked tokens in the input
sequence (Fig. 2).
This task forces the model to consider context from both directions, enhancing its bidirectional
understanding. The second task is Next Sentence Prediction (NSP), where the model learns to predict
whether two sentences naturally follow each other, fostering a grasp of the relationships between
sentences.</p>
        <p>One of BERT's key strengths is its ability to generate contextualised word embeddings. Unlike
static word embeddings, BERT's representations for a given word can vary depending on the
surrounding context, capturing nuanced word usage and polysemy effectively.</p>
        <p>The fine-tuning process allows BERT to be adapted to a wide range of downstream tasks. By
adding task-specific layers to the pre-trained BERT model and fine-tuning on task-specific data,
researchers can achieve state-of-the-art results on various NLP tasks, including question answering,
sentiment analysis, text classification, summarisation, and named entity recognition.</p>
        <p>BERT's impact extends beyond its architecture. It has sparked a new paradigm in NLP,
demonstrating the power of unsupervised pre-training on large corpora followed by supervised
finetuning. This approach has led to the development of numerous BERT variants and inspired new
research directions in contextual language modeling.</p>
        <p>Since its release, various modifications and improvements have been introduced to address
specific limitations and further enhance the model's performance on a range of NLP tasks. Some of
the notable modifications include RoBERTa, DistilBERT, and XLM-RoBERTa, each designed with
unique features to optimize BERT's efficiency, scalability, and multilingual capabilities.</p>
        <p>
          RoBERTa (a Robustly Optimized BERT pretraining Approach by Liu et al. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]) was developed to
address some of the original training challenges in BERT. RoBERTa builds upon the BERT
architecture by using more training data and a larger number of training steps, along with other
optimizations like removing the Next Sentence Prediction objective.
        </p>
        <p>Instead of focusing on the relationships between sentence pairs, RoBERTa concentrates purely
on the Masked Language Modeling objective, which has shown to be more effective for a wide range
of downstream NLP tasks. Additionally, RoBERTa utilizes dynamic masking, which allows for
different masked tokens during each epoch, offering a more diverse learning experience. As a result,
RoBERTa has consistently outperformed BERT on various benchmarks, making it a preferred choice
for tasks like text classification and sentiment analysis.</p>
        <p>
          DistilBERT (by Sanh et al. [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]) is another significant modification aimed at making BERT lighter
and faster while retaining most of its performance capabilities. Developed using a technique called
knowledge distillation, DistilBERT is approximately 60% of the size of BERT, making it faster during
both training and inference. In knowledge distillation, a smaller model (the student model) is trained
to reproduce the behavior of a larger pre-trained model (the teacher model).
        </p>
        <p>This process enables the student model, DistilBERT in this case, to learn a more compact
representation of the language while preserving 97% of BERT's language understanding abilities.
DistilBERT's smaller size makes it particularly suitable for scenarios where computational resources
are limited or where real-time performance is critical, such as in mobile or edge computing
applications.</p>
        <p>XLM-RoBERTa (Cross-lingual Language Model) is an extension of the BERT architecture
designed for multilingual tasks, building on the success of both RoBERTa and the earlier XLM.
XLMRoBERTa is pre-trained on a large-scale multilingual corpus covering over 100 languages, making it
capable of handling cross-lingual understanding and translation tasks more effectively.</p>
        <p>The model learns representations that are common across languages, which allows it to perform
well on tasks involving low-resource languages by transferring knowledge from high-resource
languages.</p>
        <p>
          Proposed by Lan et al. [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], ALBERT (A Lite BERT) addresses BERT's limitations of model size
and training time. It introduces parameter-reduction techniques like factorized embedding
parameterization and cross-layer parameter sharing. Despite having fewer parameters, ALBERT
achieves state-of-the-art results on several benchmarks while being more efficient.
        </p>
        <p>
          Developed by Clark et al. [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], ELECTRA (Efficiently Learning an Encoder that Classifies Token
Replacements Accurately) introduces a new pre-training task where the model learns to distinguish
between real input tokens and fake tokens generated by a small masked language model. This
approach is more sample-efficient than BERT's masked language modeling, allowing ELECTRA to
achieve strong performance with less computation.
        </p>
        <p>Each of these modifications brings unique strengths to the BERT family of models. RoBERTa's
focus on robust training has made it highly accurate, but it comes with increased computational
requirements due to the larger dataset and training time. DistilBERT addresses the issue of
computational expense by providing a smaller, faster alternative, making it a practical option for
deployment in environments where resources are constrained. XLM-RoBERTa, meanwhile, opens
the door to advanced multilingual applications, offering a model that can understand and process a
variety of languages effectively.</p>
        <p>In this paper we used both BERT base model and its modifications (RoBERTa, DistilBERT, and
XLM-RoBERTa).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. BERT-based classification pipeline</title>
        <p>The pipeline starts with data collection and pre-processing, where text data is cleaned and tokenized.
This involves removing any special characters, URLs, and unnecessary whitespace. Tokenization is
done using the BERT tokenizer, which converts the text into a format that the BERT model can
handle, specifically by converting words into tokens, adding special tokens like [CLS] and [SEP], and
creating attention masks that help the model focus on relevant parts of the input data.</p>
        <p>The data is then split into training, validation, and test sets to ensure that the model can be
properly evaluated. The pre-trained BERT-like model is fine-tuned on the training dataset.
Finetuning involves using the general knowledge gained during BERT-like initial training on a large
corpus and tailoring it to the specific task of detecting fake news. In this study, we freeze all layers
of the model except the last few encoders and the soft max classifier, which are retrained on our
dataset (Fig. 3).
The model is trained on the labelled dataset, adjusting its weights using the cross-entropy loss by
the Adam optimizer. During training, the learning rate is carefully controlled using a scheduler,
starting from a small value to ensure stable updates and avoid overfitting.</p>
        <p>Our study applies early stopping based on the validation loss to avoid training for too many
epochs, which can lead to overfitting. This way, the model is more likely to generalize well on
unknown data. The evaluation of the fine-tuned model is performed using metrics such as accuracy
and F1-score.</p>
        <p>These metrics provide a comprehensive understanding of the model's performance in detecting
fake news. The F1-score, as the harmonic mean of precision and recall, provides a balanced measure
insight into the types of errors the model may make.</p>
        <p>Since BERT was pre-trained on a large corpus, it can leverage the linguistic patterns learned
during this general training, thereby requiring less labelled data for the specific task of fake news
detection. This is especially advantageous given the scarcity of high-quality labelled fake news
-tuning allows it to adapt to the nuances of fake news language without starting
from scratch, making this approach efficient and effective.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Datasets and software implementation</title>
      <sec id="sec-4-1">
        <title>4.1. Datasets</title>
        <p>To effectively train and test BERT-type models for fake news detection, we utilized a variety of
datasets that are widely recognized in the field. These datasets (FakeNewsNet, particular, PolitiFact
subset, and WELFake Dataset) offer diverse contexts and sources of fake news, allowing for robust
model evaluation. Each dataset has unique characteristics that contribute to a comprehensive
82
training and testing process, helping to ensure that the models generalize well across different types
of misinformation.</p>
        <p>
          FakeNewsNet (PolitiFact) [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] is a well-established dataset that combines news content with
social context to facilitate the study of fake news detection. It includes news articles verified by the
PolitiFact fact-checking website, where each article is classified as either "fake" or "real" based on
professional verification. This dataset not only includes the text of the news articles but also
metadata such as user engagement and social media activity around each news item.
        </p>
        <p>The social context allows models to capture the diffusion patterns of fake news, which is crucial
for understanding how misinformation spreads online. By training BERT-type models on the textual
content supplemented
by considering both linguistic features and social spread patterns.</p>
        <p>We used PolitiFact subset, which contains around 1,200 records, with the average length of the
text being about 15 words, offering a moderate level of detail for each news item. This shorter length
allows the BERT models to focus on concise stylistic and contextual indicators of fake news.</p>
        <p>
          WELFake (Web Evaluated Fake News) [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] is significantly larger, with over 70,000 news articles.
Its structure is simpler, focusing primarily on the text, title of the articles and a label field that
classifies each article as "fake" or "real". This minimal structure makes WELFake an ideal dataset for
large-scale training, enabling models to learn from a vast variety of textual examples. Despite its
large size, the dataset is not entirely balanced, with a higher number of fake news records compared
to real news. This imbalance requires careful consideration during model training, such as using
class weighting or oversampling techniques to prevent the model from overfitting to the majority
class. The average length of texts in WELFake is around 540 words, which provides enough data for
models to learn linguistic patterns while ensuring efficient training time due to shorter text
sequences. This shorter length allows the BERT models to focus on concise stylistic and contextual
indicators of fake news.
        </p>
        <p>By using these datasets, we were able to train BERT-type models with a diverse range of textual
inputs and associated features. This diversity ensures that the models are not only capable of
recognizing the typical writing styles and topics of fake news but also understand how false
information is often framed within a broader social context. Moreover, combining datasets with
large-scale examples like WELFake and more specific examples like those in PolitiFact ensures a
balance between the volume of data and the richness of context. This approach helps improve the
generalization abilities of the models, making them better suited to real-world applications where
misinformation can take on many different forms and reach audiences through various channels.</p>
        <p>Thus, in the PolitiFact dataset after removing duplicates there are a few short records (about 1,000
with an average length of 15 tokens), and in the WELLFake dataset there are about 50,000 with an
average length of 540 tokens. Such diversification will allow us to test the performance of
BERTsimilar models in fundamentally different conditions.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Software</title>
        <p>In our study, we utilized a variety of software tools and libraries to train, test, and analyze the
BERTtype models in identifying fake news. These tools facilitated the entire pipeline from data
preprocessing and model training to evaluation and visualization of results. The combination of these
software solutions allowed us to leverage state-of-the-art techniques and streamline our workflow.</p>
        <p>PyTorch served as the core library for building, training, and fine-tuning our deep learning
models. As an open-source deep learning framework, PyTorch offers dynamic computational graphs
and an intuitive API, making it suitable for implementing complex BERT-like models. It also provided
robust support for GPU acceleration, which was crucial for training large language models efficiently
on the substantial datasets we used. The ease of integrating PyTorch with pre-trained models
through the Hugging Face Transformers library allowed us to fine-tune these models specifically for
the task of fake news detection.</p>
        <p>Pandas played a critical role in managing and pre-processing our datasets. Given the size and
complexity of datasets, Pandas' capabilities for data manipulation and analysis were invaluable. We
used it to load, filter, and clean data, ensuring that the text fields were formatted properly for input
into the models. Pandas also allowed us to explore dataset characteristics, such as class distribution
and text length, which helped guide our approach to model training and evaluation. Its versatility in
handling various data formats, including CSV and JSON, streamlined the process of preparing our
data.</p>
        <p>Python 3.8 served as the primary programming language for this project, owing to its simplicity,
versatility, and extensive ecosystem of libraries. Python's flexibility enabled us to integrate diverse
tools seamlessly, from data pre-processing with Pandas to model training with PyTorch.
Additionally, Python's wide range of libraries for data visualization, like Matplotlib and Seaborn,
made it easier to conduct analysis of
community support and documentation further facilitated the smooth implementation of
cuttingedge methods.</p>
        <p>Scikit-Learn was used for a range of pre-processing and evaluation tasks. This included splitting
datasets into training and testing samples, calculating various performance metrics like accuracy,
precision, recall, F1-score, and generating confusion matrices for deeper insights into model
predictions. Scikit-Learn's easy-to-use API allowed us to quickly compare different models and
preprocessing strategies, ensuring that we could iteratively refine our approach to achieve the best
results.</p>
        <p>Seaborn and Matplotlib were essential for data visualization throughout the project. Seaborn, with
its high-level interface, was used to create aesthetically pleasing and informative plots, such as
histograms of text lengths and confusion matrices, which helped us understand the distribution of
data and model performance at a glance. Matplotlib provided additional customization capabilities,
allowing us to tailor visualizations to our specific needs, such as adjusting axis scales or highlighting
specific data points.</p>
        <p>Additionally, we leveraged the Hugging Face Transformers library to access pre-trained
BERTtype models and adapt them for our task. This library enabled us to import and fine-tune models
with minimal efforts, allowing us to focus on the nuances of the fake news detection problem rather
than the complexities of implementing models from scratch. The ease of integrating these models
with PyTorch through the Transformers library made it possible to quickly experiment with different
architectures and configurations.</p>
        <p>We leveraged Google Colab as the primary development environment for training and evaluating
our models. Google Colab is a cloud-based platform that offers a Jupiter notebook interface,
providing a powerful and convenient setting for executing Python code and running deep learning
experiments. One of the key advantages of Google Colab is its access to free GPUs and TPUs, which
significantly accelerated the training process for our BERT-type models.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental setup</title>
      <sec id="sec-5-1">
        <title>5.1. Final hyperparameter settings</title>
        <p>The final hyperparameter settings are presented in Table 1. These settings describe our setup for
fine-tuning pre-trained BERT-type models. A batch size of 16 provides a balance between memory
usage and training speed that is suitable for most GPUs. Input sequences are limited to 128 (64 for
PolitiFact) tokens, which is sufficient for our datasets. The model is trained for 5 epochs, allowing it
to train on the data multiple times without overfitting.</p>
        <p>The learning rate is small, allowing careful adjustment of the pre-trained weights. For
optimization, the AdamW optimizer is used, an improved version of Adam that properly implements
weight decay. CrossEntropyLoss serves as a loss function that is standard for many classification
tasks in natural language processing.</p>
        <p>To prevent overfitting, a dropout rate of 0.3 is applied, randomly deactivating 30% of neurons
during training. The optimization process takes advantage of GPU acceleration, in particular NVIDIA
L4, which significantly speeds up the computation compared to CPU. Google Colab serves as a
training environment, offering free access to GPUs for model development.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Text preprocessing</title>
        <p>Text preprocessing for our fake identification task was intentionally minimal, leveraging the robust
capabilities of BERT-type models. These models are pre-trained on vast corpora of unrefined text,
allowing them to handle raw input effectively. This approach preserves the natural structure and
nuances of the text, which can be crucial for detecting subtle indicators of fake content.</p>
        <p>The primary preprocessing step was performed by the tokenizer specific to each BERT model
variant. These tokenizers are designed to break down text into subword units, handling
out-ofvocabulary words and maintaining semantic relationships. The tokenization effectively translates
raw text into a format that BERT models can process, without losing important linguistic
information.</p>
        <p>Our preprocessing pipeline focused mainly on preparing the data structure for input into the
model. This included binary encoding of labels, transforming the classification targets into a format
suitable for machine learning. In addition, duplicates and data with gaps were removed.</p>
        <p>We also concatenated various fields related to each piece of content, such as the author's name,
the main text of the article, its title, and the URL. This concatenation allows the model to consider
all relevant information simultaneously, potentially capturing relationships between different
aspects of the content that might indicate its authenticity or lack thereof.</p>
        <p>By keeping preprocessing minimal, we aimed to reveal the sophisticated language understanding
capabilities of BERT models. This approach allows the models to work with text that closely
resembles what it encountered during the pre-training phase, potentially improving its ability to
detect nuanced signals of fake content across various writing styles and formats.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Evaluation metrics</title>
        <p>To compare forecasting performance of the proposed models we used Accuracy metric and F1-score.
Accuracy characterizes the share of correct answers of the classifier and can be calculated as
Accuracy= TP + TN 100%,</p>
        <p>P + N
where TP and TN are the number of correctly estimated positive (articles with fake news) and
negative (articles without fakes) classes, respectively; P and N are the actual number of
representatives of each class, respectively.</p>
        <p>The F1-score provides a balanced measure of a model's performance, particularly when the
dataset is imbalanced, i.e. when the number of positive and negative instances is significantly
different. F1-score is calculated as:</p>
        <p>F1= 2  Precision  Recall ,</p>
        <p>Precision + Recall
where</p>
        <p>Precision = TP , Recall = TP ,</p>
        <p>TP + FP TP + FN
and FP, FN are false positive (predicting fake news when there is none) and false negative (assessing
news as real when it is fake) classes, respectively.</p>
        <p>Note that since we are primarily interested in the correct identification of fakes, we have chosen
them as a positive class (label 1).</p>
        <p>We also calculated Confusion Matrix, which provides a comprehensive view of the model's
performance, allowing for the calculation of various metrics and providing insights into the types of
errors the model is making. It's particularly useful for understanding the compromises between
different types of misclassifications and for fine-tuning the model to meet specific performance
criteria.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Empirical results</title>
        <p>The results presented in Tables 2 and 3 highlight the performance of various BERT-type models on
the WELLFake and PolitiFact datasets, providing insights into their strengths and weaknesses in the
context of fake news detection. The metrics of accuracy and F1-score, along with training time and
the number of trainable parameters, provide a basis for a comprehensive comparison of these models.</p>
        <p>Model Numpbaerraomfettrearisnable Ttirmaein,isnegc Accuracy F1-score
BERT base 7,088,641 4733 0.985 0.985
RoBERTa 7,679,233 2025 0.998 0.994
DistilBERT 7,680,002 4410 0.985 0.985</p>
        <p>XLMRoBERTa 7,680,002 2048 0.994 0.991
RoBERTa emerges as the best classifier with an accuracy of 0.998 and an F1-score of 0.994 on the
WELLFake dataset. This suggests that RoBERTa is particularly adept at capturing the nuances of the
language used in fake news, allowing it to make very accurate predictions. The relatively short
training time of 2025 seconds further highlights the effectiveness of RoBERTa, indicating that it can
quickly process and adapt to the dataset, making it a strong candidate for real-world applications
where both accuracy and speed are important.</p>
        <p>XLM-RoBERTa also demonstrated strong performance with an accuracy of 0.994 and an F1-score
of 0.991. XLM- -training allows it to
effectively handle the diverse linguistic features present in the WELLFake dataset. However, despite
the high accuracy, it took slightly longer to train (2048 seconds) compared to RoBERTa.</p>
        <p>BERT base and DistilBERT achieved an accuracy of 0.985 with a corresponding F1-score. While
they performed well, their accuracy did not reach the accuracy of RoBERTa or XLM-RoBERTa. This
suggests that while the basic BERT architecture can effectively classify fake news, the additional
fine-tuning and optimization present in RoBERTa and XLM-RoBERTa provide a noticeable
advantage. Moreover, these two models took almost twice as long to train. It should be noted that
86
the lighter DistilBERT architecture did not contribute to significant reduction in training time (it
took 4410 seconds compared to 4733 for BERT base), which does not make it an efficient model in
terms of computing resources.</p>
        <p>On the PolitiFact dataset, the performance landscape shifts, as shown in Table 3. Here, DistilBERT
outperforms the other models, achieving an accuracy of 0.917 and an F1-score of 0.931. This result is
particularly noteworthy because it demonstrates that DistilBERT, despite being a lighter and more
compact version of BERT, can achieve higher accuracy on smaller datasets. Its reduced the number
of trainable parameters makes it easier to train and adapt, especially when computational resources
are a constraint. The relatively short training time of 9.8 seconds further underscores its efficiency.
BERT base followed DistilBERT with an accuracy of 0.901 and an F1-score of 0.912. This performance
suggests that the original BERT architecture remains highly effective for fake news detection,
particularly when fine- training
time of 12.6 seconds compared to DistilBERT reflects the additional computational demands of its
more complex architecture.</p>
        <p>RoBERTa, which excelled on the WELLFake dataset, achieved an accuracy of 0.891 and an
F1score of 0.891 on PolitiFact. This indicates that while RoBERTa is highly effective with larger datasets
like WELLFake, it may not generalize as well to smaller datasets like PolitiFact without further
finetuning. Its training time was slightly lower than BERT, at 12.2 seconds, suggesting some
computational efficiency, but it also had lower accuracy.</p>
        <p>XLM-RoBERTa achieved the lowest accuracy on the PolitiFact dataset at 0.872, with an F1-score
of 0.883. This could be due to its design, which is optimized for multilingual tasks rather than
domainspecific datasets like PolitiFact. Although it is highly versatile across different languages and
contexts, this versatility may result in decreased performance when applied to a narrower task. The
training time for XLM-RoBERTa was also substantial at 11.1 seconds, indicating that it is not the
most efficient choice for this particular dataset.</p>
        <p>Figures 4, 5 show the loss and accuracy graphs for the best models for the datasets we used, and
Figures 6, 7 present the confusion matrices for these models.
The graphs in Figure 4 illustrate the RoBERTa model's learning progress over the course of training
epochs. The loss curve shows a steady decline, indicating that the model is successfully minimizing
classification error as training proceeds. Simultaneously, the accuracy graph rises, reflecting the
model's increasing ability to correctly classify fake news instances. The convergence of the loss and
accuracy curves suggests that the model has effectively learned on the training data without
significant signs of overfitting. The smoothness of the curves highlights the stability of RoBERTa
during training, which contributes to its high performance on the WELLFake dataset.
Figure 5 shows the loss and accuracy graphs for the DistilBERT model on the PolitiFact dataset.
Compared to RoBERTa, the DistilBERT model's loss decreases at a faster rate, indicating a more rapid
adaptation to the training data. The accuracy also increases steadily, suggesting that DistilBERT
quickly learns to distinguish between real and fake news articles. Despite the smaller architecture of
DistilBERT, the model achieves high accuracy after a few epochs, which makes it particularly suitable
for scenarios where computational resources or training time are limited. The relatively sharp drop
in loss and corresponding rise in accuracy suggest that the model efficiently utilizes the information
from the PolitiFact dataset.
The matrix in Figure 7 reveals that DistilBERT is accurate in identifying fake news, as reflected by
the high TP rate. The model's ability to minimize FN is crucial in a context where failing to identify
fake news can have significant consequences. The overall balance in correct classifications across
both classes reflects the model's adaptability to the shorter and more concise news articles, typical
of the PolitiFact dataset.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion and conclusion</title>
      <p>The results of our study illustrate the strengths and compromises of different BERT-based models in
the context of fake news detection. RoBERTa demonstrated exceptional efficiency on the WELLFake
dataset, achieving an accuracy of 0.998, highlighting its ability to handle complex and diverse text
data. Its robust training process, which focuses heavily on the masked language modeling task,
allows RoBERTa to capture subtle linguistic cues and contextual relationships that are often
indicative of fake news. This makes RoBERTa a suitable model for applications where high accuracy is
paramount, even if it comes at the cost of increased computational requirements.</p>
      <p>DistilBERT, on the other hand, excelled on the smaller PolitiFact dataset, where it achieved an
accuracy of 0.917. Its lightweight architecture, derived from the knowledge distillation process,
enables it to learn efficiently from fewer data points while maintaining a high level of accuracy. This
makes DistilBERT an ideal choice in scenarios where computational resources are limited, such as
real-time fake news detection on edge devices. The model's rapid convergence and lower training
time also make it more practical for applications that require quick deployment and frequent
retraining.</p>
      <p>However, the study also highlights certain limitations associated with each model. While
RoBERTa offers superior accuracy on larger datasets like WELLFake, its performance on the smaller
PolitiFact dataset was relatively low, with an accuracy of 0.891. This suggests that the model's
complexity might require further fine-tuning to adapt to datasets with shorter text lengths and less
diverse content.</p>
      <p>XLM-RoBERTa's results provide additional insights into the role of multilingual models in fake
news detection. Its high accuracy on the WELLFake dataset (0.994) suggests that cross-lingual
training can enhance a model's ability to generalize across diverse linguistic styles. However, its
relatively lower performance on the domain-specific PolitiFact dataset (accuracy of 0.872) indicates
that models optimized for multilingual capabilities may not always perform best on specific,
monolingual datasets without additional fine-tuning. This points to a potential compromise between
multilingual versatility and domain-specific accuracy that researchers must consider when selecting
models for fake news detection.</p>
      <p>Overall, the comparison between these BERT-based models suggests that there is no
one-sizefits-all solution for fake news detection. The choice of model depends largely on the characteristics
of the dataset, the computational resources available, and the specific requirements of the
application. For large-scale fake news detection tasks where accuracy is critical, RoBERTa is likely
the most effective choice. For environments where speed and resource efficiency are priorities, such
as mobile platforms or real-time applications, DistilBERT provides a viable alternative with its
compact structure and faster training time.</p>
      <p>Another key finding of our study is the importance of dataset diversity and structure in
influencing model performance. The WELLFake dataset, with its large volume and longer average
text length, allowed RoBERTa and XLM-RoBERTa to excel by leveraging their deeper architectures
and advanced contextual understanding. Meanwhile, the PolitiFact dataset, characterized by shorter
text samples, contributed to the effectiveness of DistilBERT in learning from more concise linguistic
patterns. These differences emphasize the need for tailored approaches in selecting models for fake
news detection, depending on the dataset's nature.</p>
      <p>The study also underscores the role of minimal preprocessing in leveraging the strengths of
BERT-based models. By allowing the models to handle raw text inputs, we preserved linguistic
nuances that are critical for distinguishing fake news. This approach highlights the power of
pretrained models in adapting to specific tasks without extensive preprocessing, making them versatile
tools for a wide range of applications in the field of NLP.</p>
      <p>In conclusion it should be noted, that proposed approach to detecting fake content in digital media
based on fine-tuned models such as BERT focuses on understanding linguistic nuances and
contextual relationships in text. However, these models do not directly check the factual content of
claims against external databases or sources. Instead, they work by detecting hidden linguistic
features and patterns commonly associated with fake or misleading information.</p>
      <p>This approach is advantageous in situations where external fact-checking is either impossible or
time-consuming, but it also introduces certain limitations as the models rely heavily on linguistic
cues rather than external verification.</p>
      <p>Thus, the findings of this study contribute to the broader effort of developing reliable AI-driven
tools for combating misinformation, which is
Future research could explore further fine-tuning techniques and hybrid approaches that combine
the strengths of multiple models to create even more robust solutions for fake news detection.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <sec id="sec-7-1">
        <title>This paper is a part of the by the Swedish Institute within the Baltic Sea Neighbourhood Programme (project No. 00152/2024). 90</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <sec id="sec-8-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Lazer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Baum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Benkler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Berinsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Greenhill</surname>
            , F. Menczer,
            <given-names>... J. L.</given-names>
          </string-name>
          <string-name>
            <surname>Zittrain</surname>
          </string-name>
          ,
          <article-title>The science of fake news</article-title>
          ,
          <source>Science</source>
          <volume>359</volume>
          (
          <issue>6380</issue>
          ) (
          <year>2018</year>
          )
          <fpage>1094</fpage>
          -
          <lpage>1096</lpage>
          . doi:
          <volume>10</volume>
          .1126/science.aao2998.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , H. Liu,
          <article-title>Fake News Detection on Social Media: A Data Mining Perspective</article-title>
          ,
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>19</volume>
          (
          <issue>1</issue>
          ) (
          <year>2017</year>
          )
          <fpage>22</fpage>
          -
          <lpage>36</lpage>
          . doi:
          <volume>10</volume>
          .1145/3137597.3137600.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Derbentsev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bezkorovainyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Akhmedov</surname>
          </string-name>
          ,
          <article-title>Machine learning approach of analysis of emotional polarity of electronic social media, Neuro-Fuzzy Modeling Techniques in Economics 9 (</article-title>
          <year>2020</year>
          )
          <fpage>95</fpage>
          -
          <lpage>137</lpage>
          . doi:
          <volume>10</volume>
          .33111/nfmte.
          <year>2020</year>
          .
          <volume>095</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Derbentsev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bezkorovainyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Matviychuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pomazun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hrabariev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hostryk</surname>
          </string-name>
          ,
          <article-title>A comparative study of deep learning models for sentiment analysis of social media texts</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          <volume>3465</volume>
          (
          <year>2023</year>
          )
          <article-title>168 188</article-title>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3465</volume>
          /paper18.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
          </string-name>
          , P. Dhariwal, ... D.
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          , Language Models are
          <string-name>
            <surname>Few-Shot</surname>
            <given-names>Learners</given-names>
          </string-name>
          , arXiv (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2005</year>
          .14165.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          , arXiv (
          <year>2018</year>
          ). URL: https://arxiv.org/abs/
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zellers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bisk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Roesner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <source>Defending Against Neural Fake News</source>
          , arXiv (
          <year>2019</year>
          ). URL: https://arxiv.org/abs/
          <year>1905</year>
          .12616.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          ,
          <article-title>Universal Language Model Fine-tuning for Text Classification</article-title>
          , arXiv (
          <year>2018</year>
          ). URL: https://arxiv.org/abs/
          <year>1801</year>
          .06146.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gebru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McMillan-Major</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shmitchell</surname>
          </string-name>
          ,
          <article-title>On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?</article-title>
          ,
          <source>in: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency</source>
          , ACM Press, Canada,
          <year>2021</year>
          , pp.
          <fpage>610</fpage>
          -
          <lpage>623</lpage>
          . doi:
          <volume>10</volume>
          .1145/3442188.3445922.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Hadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Alshara</surname>
          </string-name>
          ,
          <article-title>Fake news detection revisited: An extensive review of theoretical frameworks, dataset assessments, model constraints, and forward-looking research agendas</article-title>
          ,
          <source>Technologies</source>
          <volume>12</volume>
          (
          <issue>11</issue>
          ) (
          <year>2024</year>
          )
          <article-title>222</article-title>
          . doi:
          <volume>10</volume>
          .3390/technologies12110222.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>An overview of fake news detection: From a new perspective</article-title>
          ,
          <source>Fundamental Research</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1016/j.fmre.
          <year>2024</year>
          .
          <volume>01</volume>
          .017.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Alghamdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>A comparative study of machine learning and deep learning techniques for fake news detection</article-title>
          ,
          <source>Information</source>
          <volume>13</volume>
          (
          <issue>12</issue>
          ) (
          <year>2022</year>
          )
          <article-title>576</article-title>
          . doi:
          <volume>10</volume>
          .3390/info13120576.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Hamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J. Ab</given-names>
            <surname>Aziz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Yaakub</surname>
          </string-name>
          ,
          <article-title>A review of fake news detection approaches: A critical analysis of relevant studies and highlighting key challenges associated with the dataset, feature representation, and data fusion</article-title>
          ,
          <source>Heliyon</source>
          <volume>9</volume>
          (
          <issue>10</issue>
          ) (
          <year>2023</year>
          )
          <article-title>e20382</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.heliyon.
          <year>2023</year>
          .e20382.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Melnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Melnyk</surname>
          </string-name>
          ,
          <article-title>Enhancing Mood Detection in Textual Analysis through Fuzzy Logic Integration</article-title>
          , in: 2024 14th International Conference on Advanced Computer Information Technologies,
          <string-name>
            <given-names>ACIT</given-names>
            , IEEE,
            <surname>Ceske</surname>
          </string-name>
          <string-name>
            <surname>Budejovice</surname>
          </string-name>
          , Czech Republic,
          <year>2024</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>26</lpage>
          , doi:10.1109/ACIT62333.
          <year>2024</year>
          .
          <volume>10712628</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hraniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mazur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Matvijchuk</surname>
          </string-name>
          ,
          <article-title>Artificial neural-like network as a basis for forming logical conclusions in systems of exceptional complexity, Neuro-Fuzzy Modeling Techniques in Economics 9 (</article-title>
          <year>2020</year>
          )
          <fpage>65</fpage>
          -
          <lpage>94</lpage>
          . doi:
          <volume>10</volume>
          .33111/nfmte.
          <year>2020</year>
          .
          <volume>065</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kozlovskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Syniehub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kozlovskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lavrov</surname>
          </string-name>
          ,
          <article-title>Intellectual capital management of the business community based on the neuro-fuzzy hybrid system, Neuro-Fuzzy Modeling Techniques in Economics 11 (</article-title>
          <year>2022</year>
          )
          <fpage>25</fpage>
          -
          <lpage>47</lpage>
          . doi:
          <volume>10</volume>
          .33111/nfmte.
          <year>2022</year>
          .
          <volume>025</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Matviychuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lukianenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Miroshnychenko</surname>
          </string-name>
          ,
          <article-title>Neuro-fuzzy model of country's investment potential assessment</article-title>
          ,
          <source>Fuzzy economic review 24(2)</source>
          (
          <year>2019</year>
          )
          <fpage>65</fpage>
          -
          <lpage>88</lpage>
          . doi:
          <volume>10</volume>
          .25102/fer.
          <year>2019</year>
          .
          <volume>02</volume>
          .04.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Tanvir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Mahir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Akhter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Huq</surname>
          </string-name>
          ,
          <article-title>Detecting fake news using machine learning and deep learning algorithms</article-title>
          ,
          <source>in: 2019 7th International Conference on Smart Computing &amp; Communications</source>
          ,
          <string-name>
            <surname>ICSCC</surname>
          </string-name>
          , IEEE, Sarawak, Malaysia,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICSCC.
          <year>2019</year>
          .
          <volume>8843612</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Nasir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. S.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Varlamis</surname>
          </string-name>
          ,
          <article-title>Fake news detection: A hybrid CNN-RNN-based deep learning approach</article-title>
          ,
          <source>International Journal of Information Management Data Insights</source>
          <volume>1</volume>
          (
          <year>2021</year>
          )
          <article-title>100007</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.jjimei.
          <year>2020</year>
          .
          <volume>100007</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tipper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. F.</given-names>
            <surname>Atlam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Lallie</surname>
          </string-name>
          ,
          <article-title>An investigation into the utilisation of CNN with LSTM for video deepfake detection</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>14</volume>
          (
          <issue>21</issue>
          ) (
          <year>2024</year>
          )
          <article-title>9754</article-title>
          . doi:
          <volume>10</volume>
          .3390/app14219754.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Paka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaushik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sengupta</surname>
          </string-name>
          , T. Chakraborty,
          <string-name>
            <surname>Cross-SEAN</surname>
          </string-name>
          :
          <article-title>A cross-stitch semi-supervised neural attention model for COVID-19 fake news detection</article-title>
          ,
          <source>Applied Soft Computing</source>
          <volume>107</volume>
          (
          <year>2021</year>
          )
          <article-title>107393</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.asoc.
          <year>2021</year>
          .
          <volume>107393</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>A. D. Radhi</surname>
            ,
            <given-names>H. A. H.</given-names>
          </string-name>
          <string-name>
            <surname>Al Naffakh</surname>
            ,
            <given-names>A.-I.</given-names>
          </string-name>
          <string-name>
            <surname>Fuqdan</surname>
            ,
            <given-names>B. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hakim</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Al-Attar</surname>
          </string-name>
          ,
          <article-title>A comprehensive review of machine learning-based models for fake news detection</article-title>
          ,
          <source>BIO Web of Conferences</source>
          <volume>97</volume>
          (
          <year>2024</year>
          )
          <article-title>00123</article-title>
          . doi:
          <volume>10</volume>
          .1051/bioconf/20249700123.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Kaliyar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goswami</surname>
          </string-name>
          , P. Narang,
          <article-title>FakeBERT: Fake news detection in social media with a BERT-based deep learning approach</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>80</volume>
          (
          <issue>8</issue>
          ) (
          <year>2021</year>
          )
          <fpage>11765</fpage>
          11788. doi:
          <volume>10</volume>
          .1007/s11042-020-10183-2.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M. Q.</given-names>
            <surname>Alnabhan</surname>
          </string-name>
          , P. Branco,
          <article-title>BERTGuard: Two-tiered multi-domain fake news detection with class imbalance mitigation</article-title>
          ,
          <source>Big Data and Cognitive Computing</source>
          <volume>8</volume>
          (
          <issue>8</issue>
          ) (
          <year>2024</year>
          )
          <article-title>93</article-title>
          . doi:
          <volume>10</volume>
          .3390/bdcc8080093.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Juneja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nauman</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Muhammad, GBERT: A hybrid deep learning model based on GPT-BERT for fake news detection</article-title>
          .
          <source>Heliyon</source>
          <volume>10</volume>
          (
          <issue>16</issue>
          ) (
          <year>2024</year>
          )
          <article-title>e35865</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.heliyon.
          <year>2024</year>
          .e35865.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <article-title>RoBERTa: A robustly optimized BERT pretraining approach</article-title>
          , arXiv (
          <year>2019</year>
          ). URL: https://arxiv.org/abs/
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , T. Wolf,
          <article-title>DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter</article-title>
          , arXiv (
          <year>2019</year>
          ). URL: https://arxiv.org/abs/
          <year>1910</year>
          .01108.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , R. Soricut,
          <string-name>
            <surname>ALBERT:</surname>
          </string-name>
          <article-title>A lite BERT for selfsupervised learning of language representations</article-title>
          ,
          <source>in: International Conference on Learning Representations, ICLR</source>
          <year>2020</year>
          ,
          <article-title>OpenReview</article-title>
          , Addis Ababa, Ethiopia,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          . URL: https://openreview.net/forum?id=
          <fpage>H1eA7AEtvS</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , ELECTRA:
          <article-title>Pre-training text encoders as discriminators rather than generators</article-title>
          ,
          <source>in: International Conference on Learning Representations, ICLR</source>
          <year>2020</year>
          ,
          <article-title>OpenReview</article-title>
          , Addis Ababa, Ethiopia,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          . URL: https://openreview.net/forum?id=r1xMH1BtvB.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>FakeNewsNet</given-names>
            <surname>Dataset</surname>
          </string-name>
          ,
          <year>2019</year>
          . URL: https://github.com/KaiDMML/FakeNewsNet.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Prodan</surname>
          </string-name>
          .
          <article-title>WELFake dataset for fake news detection in text data</article-title>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1109/TCSS.
          <year>2021</year>
          .
          <volume>3068519</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>