<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Mach. Learn.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/s10994-024-06581-4</article-id>
      <title-group>
        <article-title>Benchmarking Active Learning Techniques: Insights from Multi-Domain Fake News Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergio Flesca</string-name>
          <email>sergio.flesca@unical.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Gagliardi</string-name>
          <email>marco.gagliardi@dimes.unical.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Maurmo</string-name>
          <email>danilo.maurmo@dimes.unical.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Scala</string-name>
          <email>francesco.scala@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eugenio Vocaturo</string-name>
          <email>eugenio.vocaturo@cnr.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNR-ICAR Institute of High Performance Computing and Networking of the National Research Council</institution>
          ,
          <addr-line>Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CNR-NANOTEC Institute of Nanotechnology of the National Research Council</institution>
          ,
          <addr-line>Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>DIMES - Department of Computer Engineering, Electronic Modeling, and Systems Engineering, University of Calabria</institution>
          ,
          <addr-line>Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <volume>113</volume>
      <issue>2024</issue>
      <fpage>16</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>The rapid spread of fake news poses a major societal challenge, requiring eficient and generalizable detection methods. Active Learning ofers a viable solution by reducing annotation costs while enhancing model performance. This study benchmarks multiple Active Learning strategies for fake news detection across two distinct domains: political discourse (Politifact) and entertainment news (GossipCop). We evaluate uncertainty-based methods (Entropy Sampling, Least Confidence) alongside more advanced techniques (Core-Set, K-Means, BADGE, BALD), assessing their efectiveness, eficiency and sustainability. Our findings highlight Entropy Sampling as the most accurate approach, particularly in the political domain, while K-Means emerges as the most computationally eficient. Additionally, we analyze the environmental impact of Active Learning-based training, underscoring its role in optimizing both performance and resource consumption. These insights contribute to the development of scalable and energy-eficient misinformation detection systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Active learning</kwd>
        <kwd>Cross-domain</kwd>
        <kwd>Multi-domain</kwd>
        <kwd>fake news detection</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid development of the World Wide Web since the mid’90s has significantly transformed
the way the people communicate. Online social media platforms like X (old Twitter) and
Facebook improve real-time information spread. Social media has become the primary platform
for online interaction and information exchange thanks to their ease of use, low cost and fast
dissemination[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, the internet has also become a place for fake news sharing such
as misleading information, fake reviews, deceptive advertisements, rumors and false political
statements. As a result, fake news has emerged as a major issue for both industry and academia,
as it is widely used to mislead and manipulate online users with biased or false information. Fake
news can have dangerous efects on both individuals and society [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. First, it can mislead people
leading them to accept false beliefs. A well-known study by Pennycook et al.[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] demonstrates
that repeated exposure to fake news increases its perceived truthfulness. This is a phenomenon
known as the implied truth efect. Even individuals with higher media literacy are susceptible
to this cognitive bias because familiarity encourages a sense of reliability. Second, fake news
can alter how people perceive and react to true ones. When individuals encounter debunked
fake news, they may develop an increased scepticism toward all news, including legitimate
reports. Lewandowsky et al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] describe the backfire efect: the attempts to correct false beliefs
can inadvertently reinforce them. Third, the widespread dissemination of fake news can mine
the credibility of the entire news ecosystem. A study by Lazer et al.[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] highlights how the
large-scale spread of disinformation afects democracies by encouraging suspiciousness in key
institutions, including the media, healthcare and government. Thus, fake news has become a
serious social problem that cannot be ignored.
      </p>
      <p>Contribution In this paper we aim to analyze and compare various active learning
techniques for the detection of fake news, evaluating their performance to identify the most efective
approaches across some domains. Some classification-based and emissions-based metrics are
employed to ensure the best evaluation possible. In particular, the inclusion of energy consumption
and carbon emission as evaluation criteria underscores the importance of sustainability.
Organization The rest of this paper is structured as follows: Section 2 provides an overview
of existing approaches and highlights the role of active learning in fake news detection. Section 3
outlines our methodology and experimental setup, active learning techniques, and evaluation
metrics used. Section 4 presents our experimental study, ofering a comparative analysis of the
methods across diferent datasets, detailing them and evaluation metrics. Finally, the conclusion
section 5 summarizes key findings, discusses the main contributions and limitations of our
work, and suggests directions for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review</title>
      <p>
        Developing a text classifier for a new problem requires access to training data and their
corresponding labels. Labeling is typically performed by human annotators and the common
approach involves labeling as many text documents as possible, training a classifier and
acquiring additional data and labels if the performance is inefective. However, randomly selecting
documents to extend the dataset can be ineficient because the newly added documents may
not add valuable information for classification. Active learning aims to optimize this process
by selecting the most “dificult" unlabeled documents and querying annotators for their
labels [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8, 9</xref>
        ]. This strategy has the potential to significantly reduce the efort required for
developing a new classification system [10].
      </p>
      <sec id="sec-2-1">
        <title>2.1. Active Learning applied on cross-domain Fake News Detection</title>
        <p>Some recent studies explored active learning with various methodologies to improve the
efifciency and accuracy of misinformation detection [ 11, 12]. The study by Bhattacharjee [13]
shifts the focus to news veracity detection, proposing a collaborative human-machine learning
framework. The approach employs a deep-shallow fusion model, where insights from a deep
learning classifier are combined with shallow feature-based models leveraging linguistic and
content-specific features. This allows the model to adapt to various domains or news genres,
improving its generalizability. The results indicate that this fusion-based method leads to a
25% improvement in performance, with the model requiring significantly fewer annotated
samples. The combination of shallow and deep features, along with dynamic feature weighting,
proves to be a powerful strategy for detecting fake news. The use of prioritized active learning
ensures that only the most “dificult" samples are selected for human annotation, further
reducing the labeling cost. While many fake news detection frameworks rely exclusively on deep
learning, this work advocates for combining deep and shallow models to capture both detailed
contextual information and broader linguistic patterns. This fusion approach ensures a more
comprehensive analysis of news content.</p>
        <p>
          Farinneya et al. [14] explores the challenge of detecting rumors on social media, focusing
particularly on Twitter data. The goal is to leverage the limited domain-specific labeled data and
utilize information from other domains by combining Active Learning with Transfer Learning.
Various active learning strategies are applied, including Least Confidence [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and Query by
Committee [15]. The results demonstrate that the method efectively achieves performance
comparable to fully supervised models while using only 42% of the labeled dataset, with faster
convergence in terms of F1 score. In terms of methodology, this work uniquely distinguishes
itself by its ability to apply knowledge from diferent domains to enhance rumor detection.
This is crucial as rumors in one context may share characteristics with those in other domains.
Additionally, the use of TweetBERT for textual data representation is notable, as it is specifically
ifne-tuned for social media content, contributing to the solid performance observed.
        </p>
        <p>The study by Sahan et al. [10] adopts a more traditional approach to fake news detection,
focusing on text-based classification using Active Learning (AL). The authors explore how
standard AL techniques can reduce the labeling burden in text-based fake news detection
tasks. Unlike the domain adaptation discussed earlier, this study focuses on selecting the most
informative text samples for human labeling based on uncertainty. This study serves as a
useful benchmark, showing how traditional AL methods can play a valuable role in text-based
disinformation detection, especially when compared to more complex techniques like domain
adaptation or geometric deep learning.</p>
        <p>Lee et al.[16] take a diferent approach by addressing the issue of domain adaptation in
disinformation detection, particularly for emerging topics such as the COVID-19 pandemic or
the war in Ukraine. In these cases, existing models trained on older or unrelated domains may
perform poorly due to diferences in data distribution between the source and target domains.
To address this challenge, is proposed an energy-based domain adaptation framework that
integrates active learning. The key innovation is the combination of energy-based models
with AL to transfer knowledge from domains with abundant labeled data to those with limited
labeled data. The model minimizes the energy gap between the source and target domains to
align their representations, ensuring that knowledge from source domains can be efectively
applied to emerging domains. AL is used to selectively label the most uncertain samples in the
target domain, thereby improving model performance with minimal labeled data. Experiments
show that this method improves accuracy by 10% in domain adaptation tasks compared to
baseline models, highlighting the framework’s efectiveness in adapting to new and evolving
disinformation contexts. This work is notable for its focus on cross-domain learning, an
oftenoverlooked area in disinformation detection research, but one that is crucial for addressing
misinformation on new and unfamiliar topics.</p>
        <p>Kato et al. [17] examines domain bias in supervised fake news detection. The study highlights
how models trained on domain-specific datasets struggle with generalization and proposes a
strategy to mitigate this bias using paired datasets. Using FakeNewsAMT, a dataset with paired
real and fake news on the same topic, the authors analyze domain bias through deep learning
models, particularly BERT [18]. They experiment with noun phrase masking to mitigate bias
but observe no improvement in accuracy, indicating that named entities are not the primary
source of domain bias. Instead, they identify significant lexical overlap between paired real and
fake news, which helps models generalize better across domains. Comparing models trained on
paired wrt unpaired datasets, they observe that paired data improve cross-domain detection
accuracy significantly in most cases. This suggests that dataset structure, specifically pairing
real and fake news with similar lexical patterns, plays a key role in mitigating domain bias.
This study contributes to domain adaptation research by showing that dataset design itself
can enhance model generalization, ofering an alternative to adversarial training and
domaininvariant feature extraction. Future research could explore similar dataset structures in diferent
misinformation contexts, such as scientific or health-related fake news.</p>
        <p>Barnabò et al.[19] explore the application of graph neural networks for misinformation
detection. This study highlights the shift from traditional text classification methods to geometric
deep learning, focusing on the dissemination patterns of news articles across social networks.
The authors argue that fake news detection should not rely solely on news content but also
consider how information spreads through networks. The main contribution of this study is
the development of Deep Error Sampling, a novel AL strategy that selects samples based on
prediction errors combined with uncertainty sampling. The advantage of using GNNs in this
context is their ability to model relationships between users and the articles they share, thus
capturing the social dynamics of disinformation propagation. By applying AL to GNNs, the authors
reduce the annotation burden for human fact-checkers, who play a crucial role in verifying the
authenticity of news articles. Results show that Deep Error Sampling outperforms traditional
AL methods, reducing labeling costs by up to 25% while achieving a 2% improvement in the
area under the curve. This approach highlights the potential of graph-based representations for
fake news detection, especially when combined with AL to enhance eficiency. While previous
studies primarily focused on text-based models, this work demonstrates that understanding
the propagation behavior of disinformation is equally important, ofering a new perspective on
how news is classified as true or false.</p>
        <p>The study by Folino et al.[20] introduces an approach addressing the challenge of resource
eficiency in fake news detection. The authors propose a semi-supervised method that combines
active learning (AL) with a BERT model. By integrating AL with a BERT model, the approach
seeks to reduce computational costs and human efort. This is particularly important in scenarios
with significant constraints on memory, time and energy, such as small organizations or
nonprofits lacking resources for large-scale model training. Compared to other approaches, the
key contribution of this work lies in its resource eficient design. While many studies focus on
maximizing model performance, this work emphasizes the importance of balancing accuracy
with computational demands, especially when deploying pretrained language models in
realworld applications [21, 22].</p>
        <p>All the studies trivially tackle the challenge of fake news detection using active learning,
but the diversity of methodologies underlines that there is no unique solution. The choice
of methodology depends on data characteristics, resource availability and the nature of fake
news. Approaches can difer significantly: some optimize computational eficiency while others
focus on network dynamics or domain adaptation. A study integrates transfer learning with
active learning, another employs feature representation, another one utilizes heterogeneous
graph neural networks to capture relationships among diverse entities while another addresses
multilingual fake news detection via a multi-model neural ensemble. These strategies showcase
distinct strengths and ofer complementary solutions.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Active Learning techniques</title>
        <p>
          Uncertainly sampling: Least condfience and Entropy-based algorithms The simplest
and commonly used query framework is uncertainty sampling [23]. In this approach, an active
learner selects the instances for which it has the highest uncertainty in label assignment.
• Least Confidence [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]: let  be the probability of the most likely class for a data instance
. Then the least confidence score assigned to  is simply computed as 1 − :
        </p>
        <p>=arg min  (|;)
min
1:|1|≤</p>
        <p>,∼  ︀[ (, ; 0∪1 )︀]</p>
        <p>Batch Active Learning extends this by selecting a large batch of points for labeling in each
iteration. The AL loss is upper-bounded as:</p>
        <p>where =arg max  (|;) represents the most likely class label;
• Entropy [24]: measures the overall uncertainty across all classes. A high entropy value
indicates the model is unsure about the correct class. For a data instance , if there are 
classes and  is the probability of the -th class, the entropy is calculated as:
 =arg max − ∑︀  (|;) log  ( |;)
where  ranges over all possible labels. Entropy is an information-theoretic measure that
quantifies the uncertainty of a distribution and is often used as an impurity indicator.
CoreSet sampling The core-set selection method theorized by [25] identifies a subset of
data points such that training a model on this subset achieves competitive performance on the
entire dataset. In Active Learning, only a sub-sampled pool of labeled data is available, with a
query budget  and a learning algorithm  that outputs parameters  given a labeled set .
The pool-based optimization problem is defined as:
(1)
(2)
(3)
,∼ [(, ; )] ≤ ⃒⃒⃒⃒⃒⃒ ,∼ [(, ; )] − 1 ∑∈[︁] (, ; )⃒⃒⃒⃒⃒⃒
⏟
(4)
(5)
1 ∑︁ ( ,  ; )⎠
|| ∈</p>
        <p>⎞
Core-Set⏞Loss</p>
        <p>The population risk of the model trained on  is influenced by Training Error, Generalization
Error, and Core-Set Loss. Given that CNNs exhibit low Training Error and Generalization Error
is provably bounded, the key challenge is minimizing Core-Set Loss:
⏞
Generalization Error</p>
        <p>⎞
⎛
⎛
⏟</p>
        <p>1 ∑︁ ( ,  ; )⎠
+ ⎝ || ∈
⏟</p>
        <p>Training⏞Error
1 ∑︁ (, ; ) −
+ ⎝</p>
        <p>∈[]
⃒
min ⃒⃒ 1 ∑︁ (, ; 0∪1 ) −
1:|1|≤ ⃒⃒⃒  ∈[]</p>
        <p>1 ∑︁ ( ,  ; 0∪1 )⃒⃒⃒⃒
|0 + 1| ∈ ⃒⃒
K-means sampling K-means sampling in active learning introduced in [26] addresses the
limitations of traditional uncertainty sampling. While uncertainty sampling selects the most
uncertain instance near the decision boundary, it does not consider the overall data distribution,
which is crucial for batch selection. K-means sampling overcomes this by first identifying a set
of uncertain instances within the margin and then clustering them based on feature similarity.
From each cluster, the most central instance (the medoid) is selected for labeling, ensuring
that the chosen samples are both informative and representative of diferent regions of the
feature space. This method ofers several advantages. It prevents the classifier from focusing
too narrowly on specific areas by preserving the density distribution of the dataset, avoiding
redundancy, and promoting diversity in the selected samples. This is particularly beneficial in
high-dimensional domains like text classification, where redundant training samples can slow
down model convergence. Additionally, representative sampling contributes to a more balanced
reduction of the hypothesis space, improving generalization.</p>
        <p>BALD: Bayesian Active Learning by Disagreements BALD[27] is an Active Learning
model applied to Gaussian Processes for Classification. It expresses information gain through
predictive entropies, identifying the set of points among the unlabeled examples that maximize
the expected decrease in Shannon entropy[24]. In this framework, a set of points is considered,
consisting of some labeled examples and others with unknown labels , along with the existence
of a group of latent parameters  that manage the dependency between input data and their
labels ( | , ) . The central objective is to reduce the number of possible hypotheses as
quickly as possible, minimizing uncertainty over the model parameters through entropy. The
points in ′ are chosen from  to maximize the expected entropy decrease, meaning they
are points where the parameters are more certain but, at the same time, on which there is the
greatest disagreement:
(6)
(7)
arg min [ |  ′] = −

∫︁</p>
        <p>( |  ′) ln ( |  ′)</p>
        <p>The selection criterion consists of choosing data that maximize the disagreement of the
parameters between the current model and its subsequent updates[28]. Solving equation 6 is
an NP-Hard problem; therefore, a greedy approximation strategy is applied, allowing work on
individual elements  rather than the entire set of unlabeled samples ′. The new objective is
then to find individual points  that maximize the expected entropy decrease:
arg max [ | ] −</p>
        <p>∼(|) [[ | , , ]]
BADGE: Batch Active Learning by Diverse Gradient Embeddings BADGE[29] selects
diverse sets of points with high magnitude when represented in a hallucinated gradient space,
meaning a space containing false or misleading information. This creates a strategy that
incorporates predictive uncertainty with sample diversity in the selected batches without the
need for hand-tuned hyperparameters.</p>
        <p>The algorithm starts with a set  of examples chosen uniformly at random from  , for
which the labels are to be determined. The core steps of BADGE, performed iteratively for
 steps, consist of gradient embedding and sampling computation. Specifically, for each 
belonging to the pool  , the label preferred by the current model is computed, along with the
gradient of the loss on  and the computed label, considering the parameters of the last layer
of the network. Points are then selected from the obtained gradient embedding vectors using
k-means++ initialization. Finally, the labels of these examples are queried, the model is retrained,
and the cycle is repeated.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Pipeline</title>
        <p>The active learning process follows an iterative approach, beginning with a small labeled dataset.
In each iteration, new examples are selected and added to the training set, gradually expanding
the dataset until it reaches the desired size. To ensure robustness in performance evaluation,
each active learning method is tested across multiple independent experiments. The pipeline
relies on BERT for both text embedding and classification.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Evaluation metrics</title>
        <p>We evaluated the performance of the models with Accuracy, Precision, Recall and F1, furthermore
we assessed the environmental and computational eficiency of our models using execution time
(measured in seconds), energy consumption (measured in kWh) and carbon emissions (measured
in kg), calculated with the CodeCarbon [30] library. This permits the quantification of the
sustainability of the training process by estimating the energy usage and the resulting carbon
footprint. A need for an index that summarize prediction precision and sustainability is satisfied
by the Eficiency metric, calculated this way:</p>
        <p>Eficiency Index =


.</p>
        <p>(8)</p>
        <p>These metrics provide a more punctual view of the model performances, balancing
efectiveness with ecological impact, computational eficiency and classification accuracy.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental evaluation</title>
      <sec id="sec-4-1">
        <title>4.1. Test bed</title>
        <p>Datasets To achieve the objective of comparing active learning methodologies across domains,
this study leverages two benchmark datasets that represent distinct domains of misinformation.
These datasets are diverse in their content and structure, making them ideal for evaluating the
adaptability and robustness of diferent active learning techniques. Below, we provide a detailed
description of each of them:
• Politifact [31]: is a dataset which includes political information. It comprises 600 labeled
instances annotated as true and false. The dataset includes claims made by politicians,
public figures and media. Politifact’s domain-specific nature makes it an excellent resource
for evaluating active learning in handling nuanced and context-heavy political discourse;
• GossipCop [32]: focuses on misinformation in the entertainment and celebrity news
domain. The dataset is derived from FakeNewsNet dataset[33] and contains 4356 instances
(not considering duplicates) with balanced representations of both true and false claims.
The specificity of this dataset allows for testing how well active learning models perform
in detecting misinformation in highly targeted domains.</p>
        <p>By utilizing these datasets, this study evaluates the performance of active learning techniques
across domains each characterized by challenges and patterns of misinformation.
Hyperparameter and experimental setup The initial labeled dataset consists of 100
instances, with 20 new examples added per iteration until a total of 400 labeled instances is
reached. Given this setup, the process completes in 15 iterations. Each method is evaluated
through 7 independent experiments. The BERT model is trained using a batch size of 16, the
ADAM [34] optimizer, a learning rate of 0.002, and 10 training epochs.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Test results</title>
        <p>Comparative analysis The Figure 1 shows the evaluation metrics for the Politifact dataset.
The x-axis represents the number of labeled examples while the y-axis represents the value
of the corresponding metric. Analyzing the accuracy, it is observed that Entropy Sampling is
generally the best strategy, achieving over 90% accuracy with around 250 examples. BADGE is
better of Entropy sampling only in the first steps of iteration. For the precision metric, Entropy
Sampling and CoreSet are the best strategies, holding above 0.90 at the end of the graph. Looking
at Recall, BADGE dominates in most places but Entropy Sampling is equally competitive. At the
end analyzing the F1-score, Entropy Sampling is the most balanced strategy with values above
0.90. BADGE and CoreSet are following with a steady growth. The overall analysis shows that
Entropy Sampling and Badge are the best performing strategies in all metrics, while BALD is
the least efective.</p>
        <p>Accuracy</p>
        <p>The Figure 2 shows the evaluation metrics for the GossipCop dataset. The x-axis represents
the number of labeled examples while the y-axis represents the value of the corresponding
metric. The BALD strategy was trained only for 10 iterations as the training time was excessive.
Analyzing the accuracy, it is observed that Entropy Sampling is generally the best strategy,
achieving over 74% accuracy with around 350 examples. All model are better of Entropy
sampling in the first steps of iteration. For the precision metric, Entropy Sampling is also the
best strategies, holding above 0.75. Looking at Recall, BADGE prevails in most cases despite
being trained for fewer iterations. In the end analyzing the F1-score, almost all models achieve
a similar result with values above 0.76. BADGE and CoreSet are following with a steady growth.
The overall analysis shows that Entropy Sampling is the best performing strategy in all metrics,
while BALD is the least efective.</p>
        <p>Precision
0.74
iin0.72
EKnMteroapnys SSaammpplliinngg rscPoe0.70
LBBeaaalddsgteConfidence 0.68
CoreSet
350 400
100
150
200
300
100
150
200</p>
        <p>300</p>
        <p>In the table 1, all the indices used for this analysis are reported for each dataset. For the
Politifact dataset the Entropy sampling method is still the best method both in terms of emissions
generated and in terms of eficiency index, it can be seen how the performances in terms of
execution time and energy consumed are very close to the best obtained, this confirms that for
the Politifact dataset the Entropy Sampling strategy is the best. In the Gossipcop dataset the
best strategy in terms of eficiency is Core-Set which achieves the best performances for each
index except for emissions generated where the K-means technique is the leader. Unlike the
performances for the Politifact dataset where the entropy sampling strategy was the best in the
eficiency aspect, with this dataset we find that entropy sampling remains the best in terms of
accuracy and other metrics but it is not the best in eficiency as k-means has higher values for
each index.
Dataset Impact The choice of initial instances is a very important element to take into
consideration when applying Active learning techniques, since they determine both the performance
and the results. During the experiments for this study it was noted that the accuracy and other
initial metrics had a high variance, this was due to the choice of the first 100 instances, if the
instances were able to adequately cover the distributions of the dataset, better performance was
achieved compared to other experiments where the performance was lower and more instances
were used to reach them:
• Politifact: this dataset consists of 600 instances on political statements labeled as true
and false. The best AL technique for this dataset was Entropy Sampling that with 280
instances achieved very high performance. To validate the importance of using this
technique, the same model used for AL techniques was trained with the entire model
divided into training set (480 instances), validation set (60 instances) and test set (60
instances). As can be seen in the table 2, the results achieved by the full model are lower
in all metrics, which confirms the importance of using AL techniques that, by reducing
the number of training instances, also reduce the execution time, emissions and energy
consumption;
• GossipCop: this dataset is composed of 4365 instances deals with gossip and
entertainment news. In this dataset, unlike the previous one, we have two active learning
techniques to consider because with Entropy sampling better performances are achieved
while with K-means good performances are obtained but with greater eficiency. To
validate the techniques, these two techniques were compared with a model trained with
all the instances, composed of 3492 training instances, 436 validation instances and 437
test instances. In this case the results achieved by using AL techniques are slightly lower
for the k-means technique and almost identical for the entropy sampling technique. The
results obtained are excellent considering the number of training instances. As the use of
this technique will significantly reduce both the training time and emissions and energy.</p>
        <p>Dataset
Politifact
GossipCop</p>
        <p>Strategy
None (whole dataset)</p>
        <p>Entropy Sampling
None (whole dataset)</p>
        <p>Entropy Sampling</p>
        <p>K-means</p>
        <p>No of instances Accuracy</p>
        <p>Recall Precision F1-score
Insights From the study of the results obtained, key information can be obtained on the
efectiveness and eficiency of AL techniques applied to the classification of fake news:
• The choice of initial instances in the training set influence the final performance of the
model. A more balanced training dataset that adequately represents the characteristics of
the entire dataset allows to obtain better results with a lower number of iterations;
• The results highlight the importance of the choice of the AL strategy on the final
performance. The Entropy sampling technique demonstrated better performance with a reduced
number of instances compared to other techniques, demonstrating its ability to select
more informative instances. However, for the GossipCop dataset k-means technique
achieved a higher overall eficiency, balancing performance and consumption;
• From the results obtained, it is noted that after a certain number of iterations, the addition
of new instances within the training set does not lead to significant improvements, this
highlights how AL techniques are important in the initial stages of training by reducing
the number of instances to be labeled.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and conclusion</title>
      <p>The aim of this study was to evaluate and compare diferent AL techniques for Fake News
detection. In order to obtain a robust comparative analysis, two datasets representative of
diferent domains were selected. The experimental analysis highlighted significant divergences
between the AL techniques tested. In the Politifact dataset, the Entropy Sampling method
resulted both the most efective and the most eficient, reaching an accuracy higher than 90%
with about 250 labeled instances and an Eficiency Index of 0,7234. The BADGE strategy
obtained good performances in the early stages of training, but lost efectiveness during training
without increasing performances. For the GossipCop dataset, Entropy Sampling was confirmed
as the most robust method in terms of accuracy, exceeding 74% with 350 labeled examples,
while in terms of eficiency the K-means method obtained the best performances in almost
all indicators. The results obtained confirm the value of AL in improving the computational
eficiency of fake news detection models, reducing the number of input data needed for a
good classification. Overall, the Entropy Sampling technique was the most efective for both
datasets, but its advantage was more marked in the political domain than in the entertainment
one. Furthermore, the sustainability analysis highlighted how alternative strategies, such as
K-Means, can ofer a better compromise between accuracy and computational eficiency.
Limitations Although several limitations should be noted, this study ofers insightful
information on the particular topic. First of all, the study’s two datasets might not accurately
reflect the wide variety of fields and situations in which these methods could be implemented.
Therefore, caution is needed when applying our findings to other fields. Additionally, our
review does not cover all possible AL methods. While we focus on a selection of the most
popular approaches, other methods may yield diferent or complementary results. At the end,
this paper focuses exclusively on the BERT model. The exclusion of some sophisticated models
like Roberta [35], Sentence Transformer [36] or other Large Language Models is due to the lack
of computational resources.</p>
      <p>Future Works Future works could address these limitations by expanding the datasets to
cover a larger range of domains, exploring additional Active Learning strategies and integrating
a wider array of classification algorithms to assess their eficacy in similar tasks, or could
be explored the integration of the neural approach for detecting balance-aware polarized
communities, as proposed by Gullo et al. [37], with active learning strategies to enhance fake
news detection. This integration could lead to more eficient identification of misinformation
by focusing on influential nodes within these communities. Furthermore, we will focus on
developing a framework for fake news detection that is both cross-domain and multimodal.
Current approaches primarily analyze textual data within a single domain, but misinformation
often spans multiple domains and media types. A promising direction is the integration of
graph-based structures to model relationships between diferent content modalities, such as
text, images, and metadata. Graph neural networks and knowledge graphs can help identify
hidden connections, enhancing detection capabilities. Incorporating Active Learning into this
framework will further optimize data annotation, improving eficiency and scalability. This
approach aims to provide a more holistic and adaptable system for misinformation detection,
addressing the growing complexity of fake news dissemination.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>SF, MG, and DM are partly funded by the PRIN MIRFAK project (H53D23008120001). FS is
partially supported by research project FAIR (PE00000013) Spoke 9 - Green-aware AI, under the
NRRP (National Recovery and Resilience Plan) MUR program funded by the NextGenerationEU.
The aforementioned founder had no role in data collection and analysis, decision to publish, or
preparation of the manuscript.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
Symposium of Advanced Database Systems, Galzingano Terme, Italy, July 2nd to 5th, 2023,
volume 3478 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 535–544.
[9] S. Flesca, D. Mandaglio, F. Scala, A. Tagarelli, A meta-active learning approach exploiting
instance importance, Expert Systems with Applications 247 (2024) 123320. doi:https:
//doi.org/10.1016/j.eswa.2024.123320.
[10] M. Sahan, V. Smidl, R. Marik, Active learning for text classification and fake news detection,
in: Int. Symposium on Computer Science and Intelligent Controls (ISCSIC), IEEE, 2021, pp.
87–94.
[11] L. Martirano, P. Zicari, M. Guarascio, S. F. Pisani, C. Comito, You can spread but you cannot
hide: Discovering accurate multi-modal deep fusion models for fake news detection, in:
Proc. of the 13th International Conference on Complex Networks and their Applications
(CNA), 2025, pp. 108–111.
[12] L. L. Cava, D. Costa, A. Tagarelli, Is contrasting all you need? contrastive learning for the
detection and attribution of ai-generated text, in: U. Endriss, F. S. Melo, K. Bach, A. J. B.
Diz, J. M. Alonso-Moral, S. Barro, F. Heintz (Eds.), ECAI 2024 - 27th European Conference
on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain - Including
13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), volume
392 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2024, pp. 3179–3186.</p>
      <p>URL: https://doi.org/10.3233/FAIA240862. doi:10.3233/FAIA240862.
[13] S. D. Bhattacharjee, A. Talukder, B. V. Balantrapu, Active learning based news veracity
detection with feature weighting and deep-shallow fusion, in: 2017 IEEE International
Conference on Big Data (Big Data), IEEE, 2017, pp. 556–565.
[14] P. Farinneya, et al., Active learning for rumor identification on social media, in: Findings
of the association for computational linguistics: EMNLP, 2021, pp. 4556–65.
[15] H. S. Seung, M. Opper, H. Sompolinsky, Query by committee, in: Proc. of the Fifth Annual
Workshop on Computational Learning Theory, COLT ’92, ACM, New York, NY, USA, 1992,
p. 287–294.
[16] K. Lee, G. Mou, S. Sievert, Energy-based domain adaption with active learning for emerging
misinformation detection, in: IEEE Int. Conference on Big Data (Big Data), 2022, pp. 2305–
08.
[17] S. Kato, L. Yang, D. Ikeda, Domain bias in fake news datasets consisting of fake and real
news pairs, in: 12th Int. Congress on Advanced Applied Informatics (IIAI-AAI), 2022, pp.
101–106.
[18] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, in: North American Chapter of the Association for
Computational Linguistics, 2019. URL: https://api.semanticscholar.org/CorpusID:52967399.
[19] G. Barnabò, et al., Deep active learning for misinformation detection using geometric deep
learning, Online Social Networks and Media 33 (2023) 100244.
[20] F. Folino, G. Folino, M. Guarascio, L. Pontieri, P. Zicari, Towards data-and compute-eficient
fake-news detection: An approach combining active learning and pre-trained language
models, SN Computer Science 5 (2024) 470.
[21] A. Avignone, A. Fiori, S. Chiusano, G. Rizzo, Generation of textual/video descriptions
for technological products based on structured data, in: 2023 IEEE 17th International
Conference on Application of Information and Communication Technologies (AICT), 2023,
pp. 1–7. doi:10.1109/AICT59525.2023.10313177.
[22] M. Cevallos, M. De Biase, E. Vocaturo, E. Zumpano, Fake news detection on covid 19
tweets via supervised learning approach, in: 2022 IEEE International Conference on
Bioinformatics and Biomedicine (BIBM), 2022, pp. 2765–2772. doi:10.1109/BIBM55620.
2022.9994918.
[23] D. D. Lewis, W. A. Gale, A sequential algorithm for training text classifiers, in: B. W. Croft,</p>
      <p>C. J. van Rijsbergen (Eds.), SIGIR ’94, Springer London, London, 1994, pp. 3–12.
[24] C. E. Shannon, A mathematical theory of communication, The Bell System Technical</p>
      <p>Journal 27 (1948) 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x.
[25] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations
of words and phrases and their compositionality, in: C. Burges, L. Bottou, M. Welling,
Z. Ghahramani, K. Weinberger (Eds.), Advances in Neural Information Processing Systems,
volume 26, Curran Associates, Inc., 2013. URL: https://proceedings.neurips.cc/paper_files/
paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf.
[26] Z. Xu, K. Yu, V. Tresp, X. Xu, J. Wang, Representative sampling for text classification
using support vector machines, in: Advances in Information Retrieval: 25th European
Conference on IR Research, ECIR 2003, Pisa, Italy, April 14–16, 2003. Proceedings 25,
Springer, 2003, pp. 393–407.
[27] N. Houlsby, F. Huszár, Z. Ghahramani, M. Lengyel, Bayesian active learning for
classification and preference learning, 2011. URL: https://arxiv.org/abs/1112.5745.
arXiv:1112.5745.
[28] X. Cao, I. W. Tsang, Bayesian active learning by disagreements: A geometric perspective,
2021. URL: https://arxiv.org/abs/2105.02543. arXiv:2105.02543.
[29] J. T. Ash, C. Zhang, A. Krishnamurthy, J. Langford, A. Agarwal, Deep batch active learning
by diverse, uncertain gradient lower bounds, 2020. URL: https://arxiv.org/abs/1906.03671.
arXiv:1906.03671.
[30] B. Courty, et al., mlco2/codecarbon: v2.4.1, 2024. URL: https://doi.org/10.5281/zenodo.</p>
      <p>11171501. doi:10.5281/zenodo.11171501.
[31] N. Vo, K. Lee, Where are the facts? searching for fact-checked information to alleviate the
spread of fake news, arXiv preprint arXiv:2010.03159 (2020).
[32] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, H. Liu, Fakenewsnet: A data repository with
news content, social context and spatialtemporal information for studying fake news on
social media, 2019. URL: https://arxiv.org/abs/1809.01286. arXiv:1809.01286.
[33] K. Shu, G. Zheng, Y. Li, S. Mukherjee, A. H. Awadallah, S. Ruston, H. Liu, Leveraging
multi-source weak social supervision for early detection of fake news, arXiv preprint
arXiv:2004.01732 (2020).
[34] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014. URL: http://arxiv.
org/abs/1412.6980, cite arxiv:1412.6980Comment: Published as a conference paper at the
3rd International Conference for Learning Representations, San Diego, 2015.
[35] Y. L. et al., Roberta: A robustly optimized bert pretraining approach, 2019. URL: https:
//arxiv.org/abs/1907.11692. arXiv:1907.11692.
[36] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks,
2019. URL: https://arxiv.org/abs/1908.10084. arXiv:1908.10084.
[37] F. Gullo, D. Mandaglio, A. Tagarelli, Neural discovery of balance-aware polarized
communi</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ghorbani</surname>
          </string-name>
          ,
          <article-title>An overview of online fake news: Characterization, detection, and discussion</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>57</volume>
          (
          <year>2020</year>
          )
          <fpage>102025</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , H. Liu,
          <article-title>Beyond news contents: The role of social context for fake news detection</article-title>
          ,
          <source>in: Proceedings of the twelfth ACM international conference on web search and data mining</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>312</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pennycook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bear</surname>
          </string-name>
          , E. T. Collins,
          <string-name>
            <surname>D. G. Rand,</surname>
          </string-name>
          <article-title>The implied truth efect: Attaching warnings to a subset of fake news headlines increases perceived accuracy of headlines without warnings</article-title>
          ,
          <source>Management science 66</source>
          (
          <year>2020</year>
          )
          <fpage>4944</fpage>
          -
          <lpage>4957</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lewandowsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U. K.</given-names>
            <surname>Ecker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Seifert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schwarz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cook</surname>
          </string-name>
          ,
          <article-title>Misinformation and its correction: Continued influence and successful debiasing, Psychological science in the public interest 13 (</article-title>
          <year>2012</year>
          )
          <fpage>106</fpage>
          -
          <lpage>131</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>L.</surname>
          </string-name>
          et al.,
          <source>The science of fake news, Science</source>
          <volume>359</volume>
          (
          <year>2018</year>
          )
          <fpage>1094</fpage>
          -
          <lpage>1096</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Settles</surname>
          </string-name>
          ,
          <article-title>Active Learning Literature Survey</article-title>
          ,
          <source>Technical Report</source>
          , University of WisconsinMadison Department of Computer Sciences,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Flesca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mandaglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <article-title>Learning to active learn by gradient variation based on instance importance</article-title>
          ,
          <source>in: 2022 26th International Conference on Pattern Recognition (ICPR)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2224</fpage>
          -
          <lpage>2230</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICPR56361.
          <year>2022</year>
          .
          <volume>9956039</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Flesca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mandaglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <article-title>A meta-active learning approach exploiting instance importance based on learning gradient variation</article-title>
          ,
          <source>in: Proceedings of the 31st</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>