Categorizing False Information in News Content Using an Ensemble Machine Learning Model Nataliya Boyko and Oleksandra Dypko Lviv Polytechnic National University, Lviv79013, Ukraine Abstract Society now faces a serious issue from the spread of false information and fake news in news content. The detection and eradication of fake news have been made possible by machine learning techniques. This study examines an ensemble machine learning model's performance in identifying false information in news content. Five distinct machine learning methods are used in the study, including Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Voting. The outputs of these algorithms are combined in the ensemble model to improve the precision and reliability of the classification results. It was trained and tested the suggested model using a dataset of news stories that have been classified as true or false. Several evaluation metrics, such as precision, recall, and F1- score, are used to assess the performance of the suggested model. According to the results, the ensemble model performs better than individual algorithms and has a high accuracy rate for identifying false information in news content. The effectiveness of each algorithm's contribution to the ensemble model's overall performance is also examined in the study. According to the results, the NB algorithm, then SVM, LR, RF, and Voting, all play a significant role in the ensemble model's accuracy. Our findings indicate that the Naive Bayes classifier (NB) achieved an accuracy of 93.6%, while the support vector machine (SVM) demonstrated a slightly higher accuracy of 94.9%. Logistic regression (LR) yielded an accuracy of 94.1%, while the decision tree (DT) obtained an accuracy of 90.7%. The hard voting variant achieved an accuracy of 95%, outperforming all individual algorithms, while the soft voting variant attained an accuracy of 95.4%. In conclusion, the ensemble machine learning model put forth in this study has the potential to be an important tool for spotting false information and preventing its spread. The research showcases how the integration of different machine learning techniques can enhance the accuracy and consistency of classification outcomes. Further investigation could explore alternative ensemble approaches or evaluate the suggested model's real-world performance. Keywords 1 Ensemble machine learning, fake news detection, misinformation, Naive Bayes, support vector machine, logistic regression, random forest, voting 1. Introduction Fake news and misinformation are a big problem in our society. They can cause a lot of harm by spreading rumors, false information, and even encouraging violence. That's why it's really important to find and stop the spread of fake news and misinformation [1; 2]. Machine learning methods are being used to detect fake news and misinformation, and they show a lot of promise. By using these algorithms, it can be can automatically identify and flag news articles that seem suspicious, which in turn can help stop the spread of false information [3; 4]. SCIA-2023: 2nd International Workshop on Social Communication and Information Activity in Digital Humanities, November 9, 2023, Lviv, Ukraine EMAIL: nataliya.i.boyko@lpnu.ua (N. Boyko); Oleksandra.Dypko.KNM.2018@lpnu.ua (O. Dypko) ORCID: 0000-0002-6962-9363 (N. Boyko); 0000-0002-5488-4468 (O. Dypko) ©️ 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Proceedings CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings This study examines an ensemble machine learning model's performance in identifying false information in news content. Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Voting are five of the machine learning algorithms used in the study. The outputs of these algorithms are combined to create an ensemble model, which improves the accuracy and robustness of the classification results [5; 6]. The proposed model aims to detect false information and misinformation in news articles by analyzing various attributes like content, organization, and origin. The study employs evaluation metrics such as precision, recall, and F1-score to gauge the ensemble model's effectiveness. The main focus of this project is to explore different machine learning methods to determine whether news is fake or not. The project has a few specific tasks that need to be completed in order to achieve this objective:  review and analyze the subject area;  pre-processing of initial data for further classification;  representing data as a fixed-length vector;  creating classifier models and training them. 2. Review of the Literature The publications listed provide a comprehensive review of the literature related to detecting fake news using machine learning techniques. Shinde et al. [7] conducted a literature review and identified various machine learning models used for fake news detection. Conroy et al. [8] introduced methods for finding fake news and proposed automatic deception detection models. Hassan and Meziane [9] conducted a survey of fake news identification techniques using online and socially produced data. Cristianini and Shawe-Taylor [10] introduced support vector machines and other kernel-based learning methods, which are commonly used in fake news detection. Dietterich [11] presented ensemble methods in machine learning, which combine mul-tiple classifiers to improve accuracy. Mahabub [12] proposed a robust technique for fake news detection using an ensemble voting classifier and compared its perfor-mance with other classifiers. Overall, the reviewed literature demonstrates the variety of approaches and tech-niques used in fake news detection, highlighting the importance of selecting appropri-ate models and preprocessing techniques for the specific dataset and problem at hand. 3. Methods of solving The first step in this project involves cleaning the data and converting it into a format that can be used for classification. The data is then split into two sets: one for training and one for testing (Fig. 1). Figure 1: The architecture of the fake news detection system Next, a combination of different machine learning methods is used on the training and test sets to determine if the news is fake or not. Evaluation metrics are used to measure how effective the models are, and based on this assessment, the selection of hyperparameters is done to improve the accuracy of the classification [1; 13]. 3.1. Data Preprocessing One crucial step in the machine learning process is data preprocessing, which involves converting the data into a suitable format for analysis [3; 14]. To begin with, irrelevant data is removed from the dataset. Next, any missing values in the data are identified and addressed to ensure they don't affect the final classification outcome. The technique of One-Hot-Encoding is then used to process the dependent variable, specifically the headline column of each news story, which distinguishes between real and fake news. This process involves converting the labels into binary numbers, with genuine news labeled as 1 and fake news labeled as 0 [15; 16; 18]. The data preprocessing steps are as follows:  Ensuring text consistency by converting all text to lowercase.  Removing all punctuation marks.  Tokenization, a process that divides the input sequence into meaningful units called tokens, which serve as fundamental elements for subsequent semantic processing. These tokens can represent words, sentences, paragraphs, etc. For example, Original message: ["ensemble based approach for detection of fake news using machine learning"]. Resulting message: ["ensemble", "based", "approach", "for", "detection", "of", "fake", "news", "using", "machine", "learning"].  Eliminating stop words, which are insignificant language constructs that have a negative impact on the performance of machine learning systems. These are the terms that are frequently employed to link expressions in sentences. The following are some examples of stop words in English: a, where, above, an, untildoes, will, who, when, that, what, but, by, on, about, once, and so forth. Each document has these terms deleted before moving on to the subsequent step.  The process of stemming involves transforming a word's grammatical forms, such as a noun, adjective, verb, adverb, etc., into its root form (sometimes referred to as a lemma). To find the fundamental forms of words whose meanings are similar is the major objective of stemming. 3.2. Feature Extraction In order to boost the model's precision, feature extraction is used. Consistent features can increase training costs by decreasing model performance and accuracy. Word2Vec is a popular algorithm used for natural language processing tasks that aims to represent words as dense vector embeddings in a high-dimensional space. It is a neural network-based model that learns continuous word representations from large amounts of text data. The basic idea behind Word2Vec is to capture the semantic meaning and relationships between words by representing them as vectors. It assumes that words with similar meanings are likely to appear in similar contexts. The model learns these representations by predicting the context words surrounding a target word or predicting a target word given its context words. During training, Word2Vec adjusts the vector representations of words to minimize the prediction error. As a result, words that often appear together in similar contexts end up with similar vector representations in the learned embedding space. These vector embeddings capture semantic relationships such as word similarity, analogies, and even certain syntactic relationships. Another frequently employed algorithm in machine learning for feature extraction is TF-IDF. It is appreciated for its straightforwardness and dependability. The TF-IDF algorithm comprises two components: TF, which represents the word count in the present document, and is computed using the equation (1): 𝑡𝑓(𝑡, 𝑑) = log⁡(1 + 𝑓𝑟𝑒𝑞(𝑡, 𝑑)), (1) where 𝑡𝑓 – term frequency; 𝑡 – term (word); 𝑑 - document (set of words). IDF, or Inverse Document Frequency, measures the significance of words across all documents and is computed based on equation (2). It assigns values to words, allowing us to assess their utility and importance. 𝑁 𝑖𝑑𝑓(𝑡, 𝐷) = log ( ), (2) 𝑐𝑜𝑢𝑛𝑡(𝑑 ∈ 𝐷: 𝑡 ∈ 𝑑) where 𝑖𝑑𝑓 – inverse document frequency; 𝑁 – count of corpus. Let's consider a document containing 100 words, and it is aimed to compute the TF-IDF score for the term "rumor." It was calculated the Term Frequency (TF) as 4 (the number of times "rumor" appears) divided by 100, resulting in a TF value of 0.04. To determine the Inverse Document Frequency (IDF), take into account a total of 200 documents, with "rumor" appearing in 100 of them. Consequently, IDF(rumor) can be calculated as 1 plus the logarithm of the ratio of the total number of documents to the number of documents containing "rumor," which yields an IDF value of 0.5. Finally, the TF-IDF score for "rumor" is computed as the product of TF and IDF, resulting in a TF-IDF(rumor) value of 0.025. 3.3. Naive Bayes Naive Bayes classification is a method used to calculate conditional probability, which indicates the likelihood of an event occurring given that another event has already occurred [8; 18]. It relies on Bayes' theorem and assumes that predictors are independent of each other, meaning the presence or absence of a feature in one class does not depend on other classes. This classifier is commonly used in text classification tasks and is known for its simplicity and effectiveness. There are three event models used in Naive Bayes classification: Multivariate Bernoulli Event Model, Multivariate Event Model, and Gaussian Naive Bayes classification. In Naive Bayes, the term "naive" indicates that it assumes the independence of all features, meaning that the presence of one feature doesn't affect the likelihood of another feature appearing. This model excels, particularly in situations with limited data, sometimes surpassing more intricate models in performance. In the multinomial naive Bayes model, a feature vector comprises terms that represent the occurrence, such as frequency, of a given term. On the other hand, the Bernoulli classifier determines if a term is present or not, while the Gaussian classifier is used for continuous distributions. 3.4. Logistic Regression The reason for using a logistic regression (LR) model is that it provides a clear equation for categorizing tasks that involve two or more classes. In the present study, text classification is based on several features that generate binary outcomes, resulting in two classes: true and fake news [9]. While several parameters are tested before obtaining the maximum accuracy of the LR model, it is performed hyperparameter tuning to obtain the best result for each individual data set. The logistic regression hypothesis function has the following mathematical definition (3): 1 ⁡⁡⁡⁡⁡ℎ(𝑋) = ⁡ ⁡ (3) 1 + 𝑒 −𝑥 where ℎ(𝑋)– linear regression hypothesis; 𝑋 - independent variables. Logistic regression utilizes a sigmoid function to transform raw data into probabilities, aiming to minimize the cost function to attain the optimal probability. This probability value will consistently fall within the range of 0 to 1. 3.5. Support Vector Machine Classification and regression issues can be resolved using the supervised machine learning algorithm Support Vector Machine (SVM). It is, however, frequently applied to classification issues. A high- performance machine learning method called the SVM classifier operates by segmenting the data into separate regions [10]. An alternative approach for solving the binary classification problem, utilizing diverse kernel functions, is the support vector machine (SVM). The primary goal of the SVM model is to determine a hyperplane (or decision boundary) based on a feature set to classify data points. The dimensionality of this hyperplane changes with the number of elements involved. However, locating the optimal hyperplane that maximizes the separation margin between data points of the two classes can be challenging, especially in higher-dimensional spaces where multiple potential hyperplanes may exist [13; 17; 18]. In order to categorize data points that belong to two different classes, as shown in Fig. 2, the SVM classifier draws a line (or plane or hyperplane, depending on the dimensionality of the data). One class will apply to points on one side of the line, and a different class will apply to points on the other. To boost its certainty regarding the assignment of points to specific classes, the classifier aims to optimize the separation distance between the line it constructs and the points situated on either side of it. Figure 2: Scheme of operation of the SVM classifier In order to determine which group any new data belongs to and to find the maximum margin that divides the data set into two groups, SVM is used. Because it offers notable accuracy while consuming less computing power, a support vector machine is preferred by many people. With smaller and more focused datasets, it performs incredibly well. The support vector m is efficient with memory and can handle high-dimensional spaces [11]. 3.6. Random Forest Classifier The simple, adaptable, and versatile supervised machine learning method known as Random Forest. It can resolve classification and regression issues. In order to produce better forecasting outcomes, it builds a forest out of a collection of decision tree models. In classification, decision trees each predict the outcome of a class, with the class with the largest majority of votes serving as the final prediction [12]. 3.7. Voting Ensemble Classifier Increasing model performance is the main goal of ensemble training. A model that can make more accurate predictions is created using an ensemble technique, which combines the predictions of two or more classifiers. The logic behind ensemble modeling is comparable to that of everyday activities, such as consulting with a variety of experts before making a decision. Therefore, a technique for lowering risk in decision-making is ensemble-based machine learning. An excellent illustration of this approach involves employing voting classifiers, where the ultimate classification relies on the initial votes cast by all the algorithms [13]. Hard Voting: in hard voting, the final decision is based on the majority vote of individual classifiers. It considers only the most frequent prediction without considering confidence levels or probabilities. Soft Voting: in soft voting, the final decision is based on weighted average or sum of predictions from individual classifiers, taking into account their confidence levels or probabilities. It allows for more nuanced decision-making. Spam detection, text categorization, optical character recognition, face recognition, and other tasks have all benefited from the use of ensemble learning. Ensemble learning is applicable anywhere that machine learning techniques are applicable. While ensemble learning can significantly boost model performance, it also adds complexity to the training and deployment process. Ensuring proper model calibration, handling imbalanced data, and selecting the right ensemble method are crucial considerations for successful implementation. Due to its ability to combine two or more learning models that have been trained on the entire data set, voting ensemble is frequently used for classification problems. The machine learning model in question is trained using multiple independent models from a population. It predicts the output class by considering the highest probability among these models. The voting classifier utilizes two distinct methods for determining the final prediction. 4. Experiments In this study, two datasets were utilized, one containing false messages while the other contained true messages. To prepare the datasets for model training, they were combined into a single dataset obtained from the Kaggle platform named "Fake News". The dataset was initially comprised of 44,689 records. After a preliminary analysis of the dataset, any attributes deemed unnecessary for further data processing were removed. To further analyze and train models with the dataset, it was important to determine the proportion of the data in each category. A pie chart (Fig. 3a) was then constructed to represent the percentage of each category in the complete dataset. a) b) Figure 3: a) Percentage ratio of data of two classes; b) Percentage ratio of data by category Scheme of operation of the SVM classifier According to Fig. 3a, 52.2% of the dataset consists of fake news messages, while 47.5% of the dataset is composed of real news messages. The fake news category is represented in blue, and the true news category is depicted in purple. Since both categories are roughly the same size, it indicates that the dataset is well-balanced, meaning there is a nearly equal distribution of data in each class. A balanced dataset typically leads to increased accuracy, balanced accuracy, and an even detection rate in classification models. This underscores the significance of maintaining dataset balance for effective model performance. The dataset includes news articles from various categories, and the distribution of news from each category is depicted in Fig. 3b. Based on Fig. 3b, it is evident that the dataset encompasses eight different categories. The majority of the news falls under the category of political news, followed by a significant portion of news belonging to the world news category. The remaining categories consist of news articles categorized by different regions. Data pre-processing. After a thorough examination of the above data, it was discovered that raw text data can contain irrelevant or unimportant information. It can reduce the classification accuracy and make it challenging to analyze. To counteract the issue with unimportant data, the next phase is to pre- process the data. The process will involve eliminating irrelevant information from the dataset and preparing the data for further processing. To better understand the results of text transformation for upcoming processing stages, several news examples from the dataset will be used in Table 1 for a visual representation of the results. Table 1 displays multiple news examples from the dataset utilized in this study. Table 1 News Dataset Before Preprocessing Category News 1 “Scientists have discovered that eating chocolate every day can make you lose weight.” 0 “The unemployment rate in the country dropped to 4.2% last month, according to the latest government report.” The initial phase of data preprocessing involves the elimination of punctuation. While punctuation can add grammatical context to a sentence and facilitate human comprehension, they are irrelevant to the vectorizer which only counts the number of words without the context. Therefore, to effectively use the vectorizer later on, all special characters must be removed. Table 2 News Dataset After Removing Punctuation Category News 1 Scientists have discovered that eating chocolate every day can make you lose weight 0 The unemployment rate in the country dropped to 42 last month, according to the latest government report The outcomes of the initial stage are depicted in Table 2, which highlights the absence of symbols such as ",..?!)". The subsequent step involves converting the text to lowercase. Lowercasing is a commonly employed text preprocessing technique that ensures uniformity in the case format of the input text. By converting all text to lowercase, variations such as "text", "Text", and "TEXT" are treated equivalently (Table 3). Table 3 News Dataset in Lowercase Category News 1 Scientists have discovered that eating chocolate every day can make you lose weight 0 The unemployment rate in the country dropped to 42 last month, according to the latest government report The outcomes of the second phase of message preprocessing are presented in Table 3. The table illustrates that the case of each message has been converted to lowercase and that sentences no longer start with a capital letter. The subsequent step entails tokenization of the text news. Tokenization refers to the process of dividing a text document into smaller units called tokens. These tokens can be words, symbols, or even subwords. In this study, the focus is on sentence tokenization, which involves breaking down sentences into their individual words. Table 4 indicates that each word in a sentence is treated as a distinct token. Tokenization plays a crucial role in text processing. The meaning of each sentence is derived from the words present within it. By examining the words contained in the text, it is possible to determine the text's overall content. With a list of words, statistical techniques can be employed to gain more insights from the text. For instance, word count and word frequency analyses can help identify the significance of a word in a sentence or document. Table 4 News Dataset After Tokenization Category News 1 ['scientists', 'have', 'discovered', 'that', 'eating', 'chocolate', 'every', 'day', 'can', 'make', 'you', 'lose', 'weight'] 0 ['the', 'unemployment', 'rate', 'in', 'the', 'country', 'dropped', 'to', '42', 'last', 'month', 'according', 'to', 'the', 'latest', 'government', 'report'] Common words found in natural language, such as the English articles "the" and "a", are referred to as stop words. Often, these words add little value to further analysis and can be removed from the text. Various pre-compiled lists of stop words are available for different languages, including the Python language, making them very useful in text processing (Table 5). Table 5 News Dataset After Deleting Stop-Words Category News 1 ['scientists', 'have', 'discovered', 'eating', 'chocolate', 'every', 'day', 'make', 'you', 'lose', 'weight'] 0 ['unemployment', 'rate', 'country', 'dropped', '42', 'last', 'month', 'latest', 'government', 'report'] When working with text data or any task involving natural language processing, machine learning algorithms typically require numeric data. To achieve this, the data must first undergo a process called vectorization, which transforms the text into a numerical vector representation. TF-IDF vectorization entails computing the TF-IDF score for every word in the dataset in relation to each message, which is then utilized to generate a vector. Consequently, each message within the dataset possesses its distinct vector, consisting of TF-IDF scores for each word, considering the entire set of messages. These vectors have various applications, including the assessment of document similarity by examining the cosine similarity between their TF-IDF vectors. The term frequency (TF) component of TF-IDF indicates the relative frequency of words within a document, considering the total number of words in the document. On the other hand, the inverse document frequency (IDF) refers to the inverse of the frequency with which a specific word is used across multiple documents. Figure 4: Value of the TD-IDF statistic for each word In Fig. 4, the numbers in the top right corner are the number of the sampling element and its token. The number in the bottom right corner is the calculation of the TF-IdF, which shows how much this word in the text is important. We're going to compare the accuracy of different machine learning models when it comes to classifying false news. We'll be looking at NSM (Naive Bayes Classifier), SVM (Support Vector Machine), LR (Logistic Regression), DT (Decision Tree) and the ensemble method (Voting Classifier) of two different types: hard and soft. We'll also look at the accuracy of the models using metrics like precision, accuracy, recall, f-score. Table 6 Evaluation of Machine Learning Classifiers by Different Metrics Classifier Presicion Recall F1-Score Accuracy LogLoss DT 0.916 0.913 0.914 0.907 3.347 NB 0.930 0.955 0.942 0.936 2.291 LR 0.953 0.938 0.945 0.941 2.121 SVM 0.958 0.948 0.953 0.949 1.831 Hard Voting 0.965 0.942 0.953 0.95 1.795 Soft Voting 0.956 0.959 0.958 0.954 1.657 In Table 6, different classifiers such as RF, NB, LR, SVM, Hard Voting, and Soft Voting were compared using various metrics including Precision, Recall, F1-score, Accuracy, and Log Loss. It is notable that the accuracy of these models consistently improved with each successive experiment. Log Loss, also known as logarithmic loss, serves as a critical gauge of model effectiveness. In binary classification scenarios, Log Loss reflects how closely the predicted probability aligns with the actual values of 0 or 1. As the predicted probability diverges from the actual values, the Log Loss value increases. Therefore, it is evident that the Log Loss indicator should decrease, as demonstrated in Table 6. Accuracy, a metric that broadly assesses a model's performance across all classes, indicates that the Soft Voting ensemble method outperforms the others by achieving the highest accuracy. 5. Results An ensemble approach is a potent technique to enhance model performance by merging different foundational models to craft an optimal one. The Voting Classifier trains various base models or estimators and generates predictions by consolidating the outcomes from each of these underlying estimators. The criteria for consolidation can involve combining the voting decisions derived from each estimator's results. To identify fake news, two separate ensemble techniques were used:  Hard Voting: The vote is decided by the predicted class;  Soft Voting: The vote is calculated using the predicted probabilities for the input class. The results of the Ensemble Voting approach, which combines both Hard and Soft voting, are evaluated using various metrics such as Precision, Recall, F1-score, Accuracy, and Log Loss, as described in the following Table 7. Table 7 Evaluation of Ensemble Methods by Different Metrics Classifier Presicion Recall F1-Score Accuracy LogLoss Hard Voting 0.965 0.942 0.953 0.95 1.795 Soft Voting 0.956 0.959 0.958 0.954 1.657 The Hard Voting classifier achieved an accuracy of 95%, a logarithmic loss of 1.8%, and an F1-score of 95%. In contrast, the Soft Voting classifier obtained an accuracy close to 96%, a logarithmic loss of 1.6%, and an F1-score approaching 96%. These results clearly indicate that the Soft Voting method surpasses the performance of the task outlined in this study. It can be referred to as Fig. 5 for the confusion matrix of the Hard Voting model. Figure 5: Confusion matrix for Hard Voting Regarding the confusion matrix for the Hard Voting technique, Fig. 5 illustrates that 7% of actual true-category news were incorrectly classified as false negatives. Likewise, 3% of false news were erroneously identified as positive. The confusion matrix for Soft Voting is shown in Fig.6 below. Fig. 6 illustrates that 5% of news articles from the genuine category were incorrectly classified as false negatives, while 4% of false news articles were inaccurately predicted as positive. The paragraph above summarizes the outcomes of models used to detect fake news. The dataset was analyzed using traditional machine learning classifiers like decision trees, logistic regression, support vector machines, and naive Bayes classifiers. To improve accuracy, a composite fake news detection system was built, utilizing a Voting Classifier alongside the classifiers and features mentioned earlier. According to the experimental findings, this proposed approach achieves a 96% accuracy rate, 95% precision, 95% recall, and a 95% F1-score. The evaluation emphasizes that the Soft Voting technique produced more accurate results compared to individual training methods. Figure 6: Confusion matrix for Soft Voting 6. Discussion The findings of this study show how well ensemble machine learning techniques work for spotting fake news. The Naive Bayes, Support Vector Machine, Logistic Regression, Random Forest, and Voting classifiers are combined into an ensemble model that significantly outperforms each classifier individually, achieving a 95% percent accuracy rate. This result is in line with earlier research that demonstrated ensemble methods can raise classification accuracy by fusing different models. Nonetheless, what sets this study apart is its application of ensemble techniques to address the challenge of identifying false information. While ensemble learning has been widely employed in areas like image and speech recognition, its potential in the realm of fake news detection remains relatively unexplored. By harnessing the strengths of individual classifiers and mitigating their weaknesses through the fusion of multiple machine learning models, it was crafted a more accurate and dependable model. Aside from ensemble techniques, this study explores diverse feature extraction methods like TF-IDF and Word2Vec to boost classifier performance. These methods transform unstructured text data into numerical features, allowing machine learning models to discern data patterns. Our results indicate a noteworthy enhancement in classifier performance, underscoring the importance of these techniques in the fake news detection process. Overall, the results of this study show how effective ensemble machine learning methods are for identifying fake news and offer useful suggestions for improving the feature extraction procedure. The innovative application of ensemble methods in the realm of fake news detection offers an exciting avenue for future research and has the capacity to enhance the trustworthiness and precision of fake news detection systems. 7. Conclusion An ensemble machine learning model was employed to assess its effectiveness in detecting false information and fake news within news content. This model amalgamated the outcomes of five distinct algorithms: Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Voting, with the aim of enhancing the precision and resilience of the classification results. The findings of this research carry significant implications for combatting the dissemination of false information. The suggested ensemble method holds the potential to enhance the accuracy and durability of classification results. Machine learning techniques are increasingly recognized as a promising approach for identifying false information. As a result, the proposed ensemble machine learning model is a useful method that can aid in the creation of more precise and reliable solutions for preventing the spread of false information. The effectiveness of the suggested model in a practical setting could be investigated in more detail, as well as the use of other ensemble techniques. 8. Acknowledgements The study was created within the framework of the project financed by the National Research Fund of Ukraine, registered No. 2021.01/0103, "Methods and means of researching markers of ageing and their influence on post-ageing effects for prolonging the working period", which is carried out at the Department of Artificial Intelligence Systems of the Institute of Computer Sciences and Information of technologies of the Lviv Polytechnic National University. 9. References [1] M. S. Mokhtar, Y. Y. Jusoh, N. Admodisastro, N. C. Pa, A. Y. Amruddin, Fake News Detection System Using Logistic Regression Technique In Machine Learning, IJEAT 9(1) (2019) 2407– 2410. doi: 10.35940/ijeat.A2633.109119. [2] T. Basyuk, A. Vasilyuk, V. Lytvyn, Mathematical model of semantic search and search optimization, CEUR Workshop Proceedings 2362 (2019) 96–105. [3] A. A. Alim, A. Ayman, K. D. Praveen, S. Ch. Myung, Detecting Fake News using Machine Learning: A Systematic Literature Review, Psychology and Education Journal 58(1) (2021) 1932–1939. doi: 10.17762/pae.v58i1.1046. [4] N. Boyko, Y. Kholodetska, Using Artificial Intelligence Algorithms in Advertising, in: 022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 2022, pp. 317-321, doi: 10.1109/CSIT56902.2022.10000819. [5] Z. Khanam, B. N. Alwasel, H. Sirafi, M. Rashid, Fake News Detection Using Machine Learning Approaches, IOP Conference Series: Materials Science and Engineering 1099(1) (2021). doi: 10.1088/1757-899X/1099/1/012040. [6] K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake News Detection on Social Media: A Data Mining Perspective, ACM SIGKDD Explorations Newsletter 19(1) (2017) 22–36. doi: 10.1145/3137597.3137600. [7] S. V. Shinde, S. Matkar, G. Karale, J. Tonge, Fake News Detection Using Machine Learning Literature Review, Journal of Emerging Technologies and Innovative Research 9(11) (2022). URL: https://www.jetir.org/papers/JETIR2211390.pdf. [8] N. K. Conroy, V. L. Rubin, Y. Chen, Automatic deception detection: Methods for finding fake news, Proceedings of the Association for Information Science and Technology 52(1) (2016). doi: 10.1002/pra2.2015.145052010082. [9] E. A. Hassan, F. Meziane, A Survey on Automatic Fake News Identification Techniques for Online and Socially Produced Data, in: 2019 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), Khartoum, Sudan, 2019, pp. 1–6. doi: 10.1109/ICCCEEE46830.2019.9070857. [10] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel- based Learning Methods, 1st ed., Cambridge University Press, 2000. doi: 10.1017/CBO9780511801389. [11] T. G. Dietterich, Ensemble Methods in Machine Learning, in: Multiple Classifier Systems, volume 1857, Springer, Berlin, Heidelberg, 2020. doi: https://doi.org/10.1007/3-540-45014-9_1. [12] N. Boyko, K. Kmetyk-Podubinska, I. Andrusiak, Application of Ensemble Methods of Strengthening in Search of Legal Information, in: Babichev S., Lytvynenko V. (Eds.), Lecture Notes in Computational Intelligence and Decision Making, volume 77 of Lecture Notes on Data Engineering and Communications Technologies, 2021, pp. 188-200. https://doi.org/10.1007/978-3-030-82014-5_13. [13] A. Mahabub, A robust technique of fake news detection using Ensemble Voting Classifier and comparison with other classifiers, SN Applied Sciences 2(4) (2020). doi: 10.1007/s42452-020- 2326-y. [14] G. T. Reddy et al., Analysis of Dimensionality Reduction Techniques on Big Data, IEEE Access 8 (2020) 54776–54788. doi: 10.1109/ACCESS.2020.2980942. [15] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel- based Learning Methods, 1st ed., Cambridge University Press, 2000. doi: 10.1017/CBO9780511801389. [16] I. Ahmad, M. Yousaf, S. Yousaf, M. O. Ahmad, Fake News Detection Using Machine Learning Ensemble Methods, Complexity 3 (2020) 1–11. doi: 10.1155/2020/8885861. [17] O. Mediakov, T. Basyuk, Specifics of Designing and Construction of the System for Deep Neural Networks Generation, CEUR Workshop Proceedings 3171 (2022) 1282–1296. URL: https://ceur-ws.org/Vol-3171/paper94.pdf. [18] N. Boyko, O. Dypko, Machine Learning Methods for the Detection of Misinformation in News Content, International Journal of Multidisciplinary and Current Educational Research (IJMCER) 5(3) (2023) 31-42. URL: https://www.ijmcer.com/wp- content/uploads/2023/07/IJMCER_D053031042.pdf.