1. Introduction

February

1613-0073

News Detection Exploiting Ensemble Learning Techniques

Farwa Batool

farwa.batool@imtlucca.it 0

Giuseppe Lo Re

giuseppe.lore@unipa.it 1

Marco Morana

marco.morana@unipa.it 1

Mario Tortorici

Workshop

0 Scuola IMT Alti Studi Lucca , Lucca , Italy 1 Università degli Studi di Palermo , Palermo , Italy

2025

0 3 8

Traditional fake news detection methods rely on machine learning models like Decision Trees, Random Forests, and SVM, utilizing features like word count and term frequency. These methods struggle to capture the nuanced features of fake news, especially in the case of evolving online content. To overcome these limitations, this paper provides a Multi-View Ensemble classifier which considers domain knowledge during classification, a critical feature for fake news detection. The multi-view approach allows the classifiers to identify patterns in diferent aspects, which might be missed by traditional methods. The proposed ensemble method utilizes a weighted voting strategy for combining the results from multiple classifiers. The introduction of domain knowledge allows the better generalization of classifiers, against the rapidly evolving domains of fake news. The weighted voting strategy proved to be much more eficient compared with other voting approaches and the achieved results surpassed those of a reference state-of-the-art model.

Domain knowledge fake news detection ensemble model machine learning scalability

1. Introduction

Over the past decade, digital transformation has radically changed the information landscape, giving rise to an era of instant access of news and updates through online platforms. While this shift has introduced significant advantages, it has also expanded the dissemination of fake news. Social media has particularly facilitated the quick and easy sharing of information, often without proper source verification. Fake news is designed to appear authentic and is spread for a various reasons. In accordance with [ 1 ], the possible motivations include harming against individuals, organization, or entities; manipulating public opinion on certain topics; or simply for entertainment. Adding to this problem, the inability of digital platforms in efectively controlling and countering the fake news, poses a significant challenge. The issue raises a critical question: how can a real news be distinguished efectively from fake news in a timely manner? Traditional manual verification techniques, although accurate, do not provide the level of scalability required to deal with the amount of digital information. As a result, the scientific community has shifted their focus to automating the news verification process, using Artificial Intelligence (AI) and Machine Learning (ML) models [ 2 ]. These technologies enable the analysis of large datasets in real time [ 3 ], making it one of the most relevant areas of research.

Previous studies often rely on a single view for classification overlooking the intricate patterns that fake news contains. A news item may be more representative in certain aspects–such as semantics and emotions–rather than other features like writing styles. Considering only one aspects for all news items can lead to inefective model performance, reducing the model’s ability to generalize and accurately detect fake news. The other lack in literature is the efective consideration of domain knowledge to which a news item belongs, neglecting the importance of the latter in the classification process. Fake news may contain domain-specific information, which require an understanding of the context to be detected as fake. To address these issues this work introduces an advanced machine learning

CEUR

ceur-ws.org model based ensemble techniques, capable of accurately recognizing fake news and competing with state-of-the-art models. More precisely, the contributions of this work are following: • To break down each news item into three types of features– semantic, emotional and stylistic– capturing a comprehensive view of the content. • To develop an ensemble model that integrates well-established fake news detection models, aiming to maximize classification accuracy. • To integrate domain knowledge into the classification process emphasizing the most relevant features in the context of fake news, to obtain more accurate classification results.

The remainder of this paper is organized as follows: related work is described in Section 2. The detailed methodology of this work is explained in Section 3. Experimental settings and results are discussed in Section 4. Conclusions are given in Section 5.

2. Related Work

Extensive research is concentrated on the analysis and detection of fake news [ 4, 5, 6 ] based on the examination of various features, generally classified into social context-based [ 7 ], user -based [ 8 ], or content-based [ 9 ] characteristics. The former set of features primarily focus on the propagation dynamics of news within social network, including the speed and patterns of its spread. User-based features involve assessing the credibility of the individuals sharing or posting news. Content-based features, on the other hand, emphasize on attributes of news articles, including their text, linguistic style, and source credibility.

In dealing with this last set, researchers have proposed various techniques, including knowledge graphs [10], n-grams with Term-Frequency-Inverted Document Frequency (TF-IDF) and Gradient Boosting classifier [ 11] and transformers [12]. The most significant progress in efectively extracting semantic context from texts has been made by using pre-trained models on large corpora of data such as BERT [13] and Roberta [14]. Zhang et al. in [15] conducted an analysis to evaluate how influential were the emotions extracted from comments and what relationship they had with emotions extracted from the news text. The authors define two types of emotions, those conveyed by the person writing the news (i.e., the publisher) and those that the news aroused in users (i.e., social emotion). Then, dual emotion, given by the union of publisher and social emotion, are integrated into existing models for fake news detection (e.g. BERT) showing significantly superior performance.

Other works highlighted the importance of integrating domain knowledge into fake news detection, as understanding typical phrasing, tone, and factual structure within a specific domain can significantly improve the detection of false claims in that area. However, fake news can be generally associated with multiple domains, such as politics, health, entertainment, and education making it essential to consider cross-domain features for better performance of the models [16]. While substantial research has been conducted on single-domain fake news detection [17, 18, 19], these models exhibit poor performance on any unseen or new domain. This limitation arises because the models are trained on model-specific features and struggle to generalize well on diverse domains.

To address this problem, Qi et al. [20] presented a Multi-domain Visual Neural Network (MVNN) and utilized visual content of fake news to classify the images as fake or real based on frequency domain and pixel domain. While Wang et al. [21] proposed a soft-label multi-domain fake news detection (SLFEND) utilizing two Chinese text-based datasets for extracting multi-domain features and the MLP for classification task. The experimental results show significant improvement in results using Weibo dataset. The paper [16] introduced a novel framework that preserves both domain-specific and cross-domain knowledge using independent embedding spaces. Additionally, an unsupervised instance selection technique is proposed to optimize the selection of news records for manual labeling. The results show that the proposed approach improves detection accuracy on cross-domain datasets, achieving state-of-the-art performance, especially in handling rarely-seen domains.

BPE NRC-EL NRC-EIL VADER Wikipedia emoticons [CLS] token1 token2 Emotional Lexicon Emotional Intensity Emotional Score

Auxiliary Features

Characters, sentences, words, clauses, AWL, LW,

RIX@,LIX CRreeaddiabbiilliittyy !, idiaodmjesc,tiimveasges, SAettnrsaictitviviteyness Degree of adverb, pronouns, question

marks Feature Extractors Level 0 Features

RoBERTa

SemExtractor

EmWbeodrdding

Embedding Convolutional Max

Layer Layer PLoaoylienrg EmoExtractor StyExtractor

Hidden Layers Hidden Layers

Feature Extractors Level 1 Features

rsem

Nan et al. [22] exploited the power of BERT to generate embeddings from news texts which belong to diferent domains. Based on [ 22] and [15], Zhu et al. also proposed an integrative framework called M3FEND for automatic multi-domain fake news detection [23]. Two main challenges were Domain Shift and Incompleteness of domain labels. According to the former fake news can evolve rapidly, resulting in a change in data distribution. It is therefore necessary to improve the generalization capacity of models. The latter states that a news item can be classified as belonging to only one domain, but it can deal with topics coming from multiple fields, for example a news item from the world of politics could also concern the field health care. The model addressed these challenges by applying a multi-view, multi-domain approach. Three types of features Semantics, Emotions and Styles were extracted from a Multi-channel Multi-view Extractor that allows to extract information coming from diferent representations of the news. To enrich the information coming from the domain, a component called Domain Memory Bank was used in which all the relevant characteristics of each domain were collected and stored. A Domain Adapter aggregated these representations and model the discrepancy between domains.

Although these models exhibit good performance the cost of models is also increased for domain alignment and domain labels assignment[24]. Additionally, a news item can belong to more than one domain, but can have limited relevance to other news items [25] in that domain. In this case, forcing the models to learn and annotate the domain labels will cause a domain bias. To address these problems, we propose a simple ensemble model with voting strategies assigned weights based on the respective domains of news items. Ensemble methods are preferred because individual classifiers are prone to risks such as variance, bias or over-fitting. However, the ensembles have consistently outperformed individual classifiers in various applications such as sentiment analysis [ 26], anomaly detection [27], and intrusion detection [28, 29].

3. Methodology 3.1. Multi-Domain Feature Extraction

Similar to [23], the feature extraction method operates at two levels. First, three kind of basic features, i.e., semantic, emotional, and stylistic, are extracted from the text. These are further propagated into three distinct deep extractors in order to provide higher-level representations. The whole process is summarized in Figure 1, where semantic, emotional, and stylistic features extractions are represented by blue, orange, and green modules, respectively. 3.1.1. Level 0 Features For semantic view, the pre-trained RoBERTa (Robustly Optimized BERT Pretraining Approach) tokenizer is employed, similar to the approach used in [22]. The content of each news item is tokenized using Byte-Pair Encoding (BPE) which keeps the common words intact and breaks down the rare or unseen words into sub-units. Each generated token is mapped into a dictionary, with special tokens such as [CLS] for start of the sequence, [SEP] for the separation between the sentences and [PAD] for padding to make sure uniform sequence length. A maximum length of 300 is set, reflecting the character length constraint typical on social media platforms such as X. Additionally, an attention mask was also generated to distinguish the actual tokens from padding where the mask is set to 1 for real tokens and 0 for padding tokens.

The emotional view captures the feeling of both author’s and reader’s towards the topic, therefore integrating this information allows the introduction of useful patterns in detection of fake news. Here, 38 emotional features are extracted and grouped into four categories: • Emotional Lexicon: This set captures emotions through text that conveys emotion using specific words. The NRC1 Emotion Lexicon (NRC-EL) is used which associates words with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy and disgust) and two feelings (negative and positive). For each word in the lexicon, a flag (with value 0 or 1) denotes its association with a given emotion. • Emotional Intensity: Each word in this set is given a score (between 0 or 1) depending on the strength of emotion conveyed by it. The NRC Emotion Intensity Lexicon (NRC-EIL) is used in this step. • Emotional Score: This set measures the impact of emotion related to the text, using numerical values through the Valence Aware Dictionary and sEntiment Reasoner (VADER) package of NLTK library. • Auxiliary Features: The aim is to extract the characteristics of non-verbal elements, such as emoticons, punctuation elements and capital letters. Emoticons from Wikipedia2 are utilized in this step.

The stylistic view captures the fact that fake news authors have distinct linguistic patterns, i.e., they need to adopt a particular writing style to convince readers of authenticity and achieve high engagements. Therefore, analyzing these patterns can help in fake news detection [18]. Therefore, based on [30], to study the stylistic view of the text, a total of 18 stylistic features were extracted and grouped into four categories i.e., readability, credibility, attractiveness, and sensitivity of a text. 3.1.2. Level 1 Features Level 0 features extracted so far are propagated into deep extractors, called SemExtractor, EmoExtractor and StyExtractor. The SemExtractor module employs a TextCNN model, which receives the Level 0 features as inputs and uses the pre-trained RoBERTa model to generate word embeddings. Convolutional 1https://github.com/RMSnow/WWW2021/tree/master/resources/English/NRC 2https://en.wikipedia.org/wiki/List_of_emoticons iflters are then applied, followed by a max pooling operation to produce deep semantic features. The resulting output is a feature vector

having a dimension of 320 (64 feature maps x 5 filters). The EmoExtractor designed for deep emotional representations utilizes a Multilayer Perceptron (MLP) consisting three layers. The input to the network as the Level 0 emotional features. The first hidden layer expands the initial 38-dimension vector into a 256-dimensional representation using ReLU activation function for nonlinearity. Then the second hidden layer further expands the feature vector to 320dimensional representation. The output is a feature vector denoted as . Similar to EmoExtractor, the StyExtractor also consists of an MLP architecture with three layers. The diference is that the initial stylistic feature vector is 18-dimensional. All the resulting feature vectors are 320-dimensional representations of the same news from diferent points of views.

3.2. Multi-view Ensemble Classifier (MEC)

The proposed model employs a multi-view approach to detect fake news by integrating the three types of deep-features ( ,

, ). The goal is to analyze each news item from diferent perspectives to obtain a more accurate classification through an ensemble-based system of classifiers. For each feature group, multiple machine learning models–including Decision Tree, Random Forest, Support Vector Machine (SVM) and Multilayer Perceptron (MLP) are used to constitute the base learners within the ensemble. It is then followed by soft voting to aggregate the probabilities produced by the each ensemble. Lastly, to combine the predictions from the chosen classifiers, three voting strategies have been evaluated, namely hard, soft , and weighted voting.

In the case of hard voting, each of the n ensembles provides a discrete prediction ̂ for each sample x. The final decision is determined by choosing the class ̂ that obtains the majority of predictions. In soft voting , instead, each of the n ensembles returns a probability ( = |) indicating that sample x belongs to class ∈ {0,1}. The final prediction is based on the arithmetic mean of the probabilities provided by each ensemble: =̂ arg max ∑ ( ̂ = ).

The weighted voting strategy is similar to soft voting, but weights are assigned based on the domain d to which a given sample belongs. The weights are determined during the validation phase, based on the intermediate F1-scores and Accuracy metrics [31].

For each domain d a weight vector = [ , , ]is created representing the influence of the semantic, emotional, and stylistic views on classification within that domain. Each vector is normalized such that the sum of the weights is equal to 1. The vectors obtained for each single domain, form a weight matrix W consisting of m rows corresponding to the domains and n columns corresponding to the views. For three-domain problem, a 3x3 matrix is generated as follows: =

0 [

1 2 0 1 2 0 1 ] 2

In the proposed experimental scenario, two separate matrices – one for F1-scores and one for Accuracies – will be considered. During the final prediction phase, assuming that the news item to be classified belongs to the j-th domain, the corresponding weight vector is extracted from the W rsem rrsemo …

remo wsem

wsty wemo SOFT VOSSoTofIfNttG VVoottiinngg WEIGHTED VOTING

Voting OSUtrTatPegUy T Output rrsemo rsty …

rsty SOFT VOSSoTofIfNttG VVoottiinngg Voting

Soft Voting

Strategy Output remo Soft Voting Voting

Strateg Outp rsem … SOFT VOSToIfNtG

Voting

4. Experimental Analysis and Discussion

To conduct the experiments, a custom dataset was created by merging data from two existing datasets: FakeNewsNet and MM-COVID. The FakeNewsNet3 is one of the widely used dataset in research and includes data of 23194 news items from two domains: entertainment (GossipCop) and politics (PolitiFact). MM-COVID 4 dataset is a multilingual and multimodal dataset designed for news related COVID-19 pandemic. The news items have been labeled as true or false by reliable fact-checking sources, such as Snopes and the International Fact-Checking Network (IFCN). The dataset consists of news items in six main languages: English, Spanish, Portuguese, French, Hindi and Italian. In particular, there are about 4000 news items in English, equally balanced between real and fake. Note that since the datasets are old, most of users were unavailable, and due to the privacy policies the complete dataset is not withdrawn in this research. The comparison with state-of-the-art models is also done according to the available dataset only.

Considering a strong imbalance in the dataset, a sub-sampling was performed to ensure a homogeneous distribution of real and false news. Thus, the final dataset consists of a total of 14000 news items, belonging to three diferent domains: GossipCop, PolitiFact and COVID. Each news was characterized by its id, content, label and domain. The data distribution was 50% of fake news and 50% of real news. In case of domains, GossipCop was 76%, samples from PolitiFact were 6.2% and 17.8% of samples from COVID dataset.

Random Forest

SVM MLP n_estimators=1000, max_depth=20 kernel=poly,

C=1, gamma=1

The experiments were conducted with a 60-20-20 train-test-validation split, while a grid search approach was employed to optimize the hyperparameters for each classification models along with k-fold cross-validation (k = 5). The optimal hyperparameters identified were then used to train the base classifiers for each feature group. The configurations of the classifiers are as mentioned in Table 1. Additionally, a fixed random seed (random_state=2024), was set to ensure experimental reproducibility.

The first experiment aimed to evaluate the performance of the proposed architecture using a balanced dataset (50% real and 50% fake) of news uniformly taken from the GossipCop, PolitiFact, and COVID domains (i.e., 33.33% from each dataset). The objective was to evaluate the performance of the model in the absence of any bias resulting from a non-uniform distribution of samples. The results obtained by applying the various voting strategies are reported in Table 2 with the best metrics highlighted in bold.

As it can be observed, the PolitiFact domain was consistently classified with high accuracy across all voting strategies, most likely because the news belonging to this domain have a more structured nature. Good results are also obtained considering the COVID domain, whilst the GossipCop posed a challenge to the model. The reason is probably that the news belonging to this domain present more unpredictability which would require for the model to be trained with a larger number of samples in order to detect these aspects. From an overall performance analysis, the proposed weighted voting strategies (both based on accuracy and F1-score) emerged as the most balanced and well-performing ones, achieving highest average (over all domains) accuracy and F1-score values. At the same time, the soft voting strategy also proved to be a good alternative, consistently performing well compared to hard voting which is the least flexible among the strategies.

In the second experiment, the entire available dataset was used, which showed a balance for the classification labels (50% real and 50% fake) and an imbalance for the domain labels (GossipCop domain (76%), COVID (17.8%) and PolitiFact (6.2%)).

The results, reported in Table 3, are significantly better for GossipCop, with a slight reduction in accuracy for the other two domains. This turns out to be predictable because of both a significant increase in the number of samples and the unbalanced distribution of the domains. Notably, the increased accuracy in the GossipCop domain supports the hypothesis from experiment 1: with a more significant number of samples, the model is capable of capturing the complex patterns, thereby improving classification performance.

The third experiment presented a more realistic scenario, where real news is present in a greater quantity than false news, i.e., 41% vs 59%. The results are presented in Table 4 which shows that the weighted voting strategies again outperform others across multiple datasets. Here, hard and soft voting strategies provided competitive performance but were unable to capture the nuanced patterns such as in GossipCop domain, due to the presence of low number of its samples. While earlier studies [32] suggested that the weight estimation might be challenging, the proposed approach addresses this by achieving competitive or better F1-scores and accuracies, demonstrating that the weighted voting performs better than simple voting. The reason being diferent domains exhibiting distinct properties in terms of semantics, emotional and stylistic features. This overall analysis suggests that weighted ensemble methods, especially those focused on F1 scores, are well-suited for multi-domain fake news detection where cross-domain features must be efectively accounted for.

Finally, in order to demonstrate the efectiveness of the proposed architecture as compared with stateof-the-art, Table 5 compares the overall performance MEC with the M3FEND model. It is worth noticing that M3FEND has been tested on the same dataset/features used to evaluate MEC, thus results are slightly diferent from those reported in [ 23]. As it can be observed, MEC achieves better performances according to the all the four metrics, with a few exceptions where values are almost comparable.

5. Conclusion

In this study, machine learning techniques were evaluated for their efectiveness in detecting fake news, leveraging state-of-the-art methodologies. A novel architecture based on ensemble machine learning techniques was proposed for multi-view fake news recognition. The model analyzes each news item from three perspectives: semantics, emotions, and writing styles. These perspectives are used to extract the basic features which are further processed by deep feature extractors to generate more complex features. These comprehensive feature vectors provide a multi-faceted representation of news items, enabling the model to incorporate domain-specific knowledge efectively. The aim of ensemble model is to mitigate the bias and errors inherent in the individual classifiers. Therefore, the ensemble model aggregates the predictions from multiple classifiers, providing results with the highest probability. This is followed by an advanced voting strategy integrating domain knowledge. This approach provides not only robust predictions but also ensure model’s ability to generalize on diverse domains.

The results demonstrated the model’s eficacy in various dataset configurations, outperforming comparable approaches in the literature and setting a new benchmark for accuracy and performance in fake news detection. The domain-dependent weighted F1 voting strategy showed particular promise for real-world applications.

Future research directions include integrating a domain prediction component into the model to automate the domain identification process, enhancing accuracy while maintaining scalability. Additionally, expanding datasets to include a wider range of domains and incorporating user comments from social media platforms could provide richer insights and uncover latent patterns, further advancing the model’s capabilities.

Acknowledgments

This work was partially supported by the AMELIS project, within the project FAIR (PE0000013), and by the ADELE project, within the project SERICS (PE00000014), both under the MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU.

Declaration on Generative AI

The author(s) have not employed any Generative AI tools. [10] J. Z. Pan, S. Pavlova, C. Li, N. Li, Y. Li, J. Liu, Content based fake news detection using knowledge graphs, in: The Semantic Web–ISWC 2018: 17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018, Proceedings, Part I 17, Springer, 2018, pp. 669–683. [11] H. E. Wynne, Z. Z. Wint, Content based fake news detection using n-gram models, in: Proceedings of the 21st international conference on information integration and web-based applications & services, 2019, pp. 669–673. [12] S. Raza, C. Ding, Fake news detection based on news content and social contexts: a transformerbased approach, International Journal of Data Science and Analytics 13 (2022) 335–362. [13] J. Devlin, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [14] Y. Liu, Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019). [15] X. Zhang, J. Cao, X. Li, Q. Sheng, L. Zhong, K. Shu, Mining dual emotion for fake news detection, in: Proceedings of the web conference 2021, 2021, pp. 3465–3476. [16] A. Silva, L. Luo, S. Karunasekera, C. Leckie, Embracing domain diferences in fake news: Crossdomain fake news detection using multi-modal data, in: Proceedings of the AAAI conference on artificial intelligence, volume 35, 2021, pp. 557–565. [17] Q. Zhang, Z. Guo, Y. Zhu, P. Vijayakumar, A. Castiglione, B. B. Gupta, A deep learning-based fast fake news detection model for cyber-physical social services, Pattern Recognition Letters 168 (2023) 31–38. [18] D. K. Vishwakarma, P. Meel, A. Yadav, K. Singh, A framework of fake news detection on web platform using convnet, Social Network Analysis and Mining 13 (2023) 24. [19] Y. Wang, F. Ma, Z. Jin, Y. Yuan, G. Xun, K. Jha, L. Su, J. Gao, Eann: Event adversarial neural networks for multi-modal fake news detection, in: Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining, 2018, pp. 849–857. [20] P. Qi, J. Cao, T. Yang, J. Guo, J. Li, Exploiting multi-domain visual information for fake news detection, in: 2019 IEEE international conference on data mining (ICDM), IEEE, 2019, pp. 518–527. [21] D. Wang, W. Zhang, W. Wu, X. Guo, Soft-label for multi-domain fake news detection, IEEE Access 11 (2023) 98596–98606. doi:10.1109/ACCESS.2023.3313602. [22] Q. Nan, J. Cao, Y. Zhu, Y. Wang, J. Li, Mdfend: Multi-domain fake news detection, in: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 3343–3347. [23] Y. Zhu, Q. Sheng, J. Cao, Q. Nan, K. Shu, M. Wu, J. Wang, F. Zhuang, Memory-guided multi-view multi-domain fake news detection, IEEE Transactions on Knowledge and Data Engineering 35 (2022) 7178–7191. [24] H. Liu, W. Wang, H. Li, H. Li, Teller: A trustworthy framework for explainable, generalizable and controllable fake news detection, arXiv preprint arXiv:2402.07776 (2024). [25] J. Li, X. Feng, T. Gu, L. Chang, Dual-teacher de-biasing distillation framework for multi-domain fake news detection, in: 2024 IEEE 40th International Conference on Data Engineering (ICDE), IEEE, 2024, pp. 3627–3639. [26] G. Wang, J. Sun, J. Ma, K. Xu, J. Gu, Sentiment classification: The contribution of ensemble learning, Decision support systems 57 (2014) 77–93. [27] L. Bilge, D. Balzarotti, W. Robertson, E. Kirda, C. Kruegel, Disclosure: detecting botnet command and control servers through large-scale netflow analysis, in: Proceedings of the 28th Annual Computer Security Applications Conference, 2012, pp. 129–138. [28] V. Agate, F. Concone, A. De Paola, P. Ferraro, S. Gaglio, G. Lo Re, M. Morana, Adaptive ensemble learning for intrusion detection systems, in: CEUR Workshop Proceedings, volume 3762, CEUR-WS, 2024, pp. 118–123. [29] V. Agate, D. Felice Maria, A. De Paola, P. Ferraro, G. Lo Re, M. Morana, A behavior-based intrusion detection system using ensemble learning techniques., in: ITASEC, 2022, pp. 207–218. [30] Y. Yang, J. Cao, M. Lu, J. Li, C.-W. Lin, How to write high-quality news on social network? predicting news quality by mining writing style, arXiv preprint arXiv:1902.00750 (2019). [31] R. Wardoyo, A. Musdholifah, G. A. Pradipta, I. N. H. Sanjaya, Weighted majority voting by statistical performance analysis on ensemble multiclassifier, in: 2020 Fifth International Conference on Informatics and Computing (ICIC), IEEE, 2020, pp. 1–8. [32] G. Fumera, F. Roli, A theoretical and experimental analysis of linear combiners for multiple classifier systems, IEEE transactions on pattern analysis and machine intelligence 27 (2005) 942–956.

[1]

Zannettou ,

Sirivianos ,

Blackburn ,

Kourtellis , The web of false information: Rumors, fake news, hoaxes, clickbait, and various other shenanigans , Journal of Data and Information Quality (JDIQ) 11 ( 2019 ) 1 - 37 .

[2]

Batool ,

Canino ,

Concone , G. Lo Re,

Morana , A black-box adversarial attack on fake news detection systems ( 2022 ).

[3]

Concone , A. De Paola , G. Lo Re, M. Morana , Twitter analysis for real-time malware discovery , in: 2017 AEIT International Annual Conference, IEEE, 2017 , pp. 1 - 6 .

[4]

P. K.

Verma ,

Agrawal , I. Amorim ,

Prodan , Welfake: word embedding over linguistic features for fake news detection , IEEE Transactions on Computational Social Systems 8 ( 2021 ) 881 - 893 .

[5]

Liao ,

Chai , H. Han,

Zhang ,

Wang ,

Xia ,

Ding , An integrated multi-task model for fake news detection , IEEE Transactions on Knowledge and Data Engineering 34 ( 2021 ) 5154 - 5165 .

[6]

Gravanis ,

Vakali ,

Diamantaras ,

Karadais , Behind the cues: A benchmarking study for fake news detection , Expert Systems with Applications 128 ( 2019 ) 201 - 213 .

[7]

Shu ,

Wang , H. Liu, Beyond news contents: The role of social context for fake news detection , in: Proceedings of the twelfth ACM international conference on web search and data mining , 2019 , pp. 312 - 320 .

[8]

Shu ,

Zhou ,

Wang ,

Zafarani , H. Liu, The role of user profiles for fake news detection , in: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining , 2019 , pp. 436 - 439 .

[9]

M. R.

Kondamudi ,

S. R.

Sahoo ,

Chouhan ,

Yadav , A comprehensive survey of fake news in social networks: Attributes, features, and detection approaches , Journal of King Saud UniversityComputer and Information Sciences 35 ( 2023 ) 101571 .