Application of Neural Networks to Identify of Fake News
Iryna Afanasieva, Nataliia Golian, Vira Golian, Artem Khovrat and Kostiantyn Onyshchenko
Kharkiv National University of Radio Electronics, 14, Nauky, Ave., Kharkiv, 61166, Ukraine


                 Abstract
                 The problem of determining information reliability is especially acute in connection with
                 significant social unrest, which often manifests itself in recent times. At the same time, news
                 resources use aggregated data from social networks, which is increasingly difficult to verify.
                 Various classification algorithms make it possible to speed up this process, however, in
                 critical conditions, the accuracy of the forecast becomes low. For the current study, it was
                 determined to investigate the effectiveness of using neural networks to detect fake news. To
                 increase the accuracy of the classification, it was decided to create a data preprocessing
                 algorithm based on the basic principles of natural language processing. Based on the results
                 of this study, linguistic patterns of fake news were identified. The resulting templates became
                 the basis for data preprocessing. The features of convolutional and recurrent neural networks
                 and their modifications for the analysis of text data are disclosed. To compare certain models,
                 a set of indicators characterizing the efficiency of the algorithms was chosen. The
                 classification accuracy of these models was tested on data related to the election of the
                 President of the United States and the large-scale invasion of the Russian Federation into the
                 territory of Ukraine. The resulting indicator of the effectiveness of the classification of fake
                 news allows us to state the feasibility of using certain modifications of both models to verify
                 the reliability of the information.

                 Keywords 1
                 Classification, CNN, information reliability, LSTM, Neural Network, RNN

1. Introduction
   According to the Cambridge Dictionary: "fake news" – false stories that appear to be news, spread
on the internet or using other media, usually created to influence political views or as a joke [1]. Their
history is quite long, however, with the growing popularity of social networks, especially anonymous
ones, this problem has become acute in front of world society. Its catalyst is also the development of
technologies aimed at adjusting video and audio information, and technologies for creating bots.
   Before the beginning of the full-scale invasion of the Russian Federation on the territory of
Ukraine, a vivid example of manipulation with fakes was the information about the spread of the
coronavirus from China, in particular the story about eating bats and the global conspiracy regarding
the vaccine [2]. Before this story, there were reports of news manipulation during the election
campaign in India, where according to some sources, 40% of the information was fake [3]; and during
the presidential elections in Brazil in 2019, where 11,957 viral messages were distributed using the
WhatsApp network, of which 42% contained information that was deemed false by regulatory
authorities [4].
   As of today, given the general political destabilization in the world, the number of fakes has
increased both in absolute and relative terms. Some of them can be attributed to the incorrect
subjective perception of real information, but a significant number is the result of military

COLINS-2023: 7th International Conference on Computational Linguistics and Intelligent Systems, April 20–21, 2023, Kharkiv, Ukraine
EMAIL: iryna.afanasieva@nure.ua (I. Afanasieva); nataliia.golian@nure.ua (N. Golian); vira.golan@nure.ua (V. Golian);
artem.khovrat@gmail.com (A. Khovrat); kostiantyn.onyshchenko@nure.ua (K. Onyshchenko)
ORCID: 0000-0003-4061-0332 (I. Afanasieva); 0000-0002-1390-3116 (N. Golian); 0000-0001-5981-4760 (V. Golian) 0000-0002-1753-
8929 (A. Khovrat); 0000-0002-7746-4570 (K. Onyshchenko)
            ©️ 2023 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
propaganda. If in the case of elections, the reason for the spread of fake news was to influence
opinions, now it is aimed at demoralizing the population and the military to suppress their opposition.
In addition, there are attempts to influence people who are not parties to a full-scale war to discredit
the aid provided. The largest channels of such information are Twitter and Telegram [5]. In some
cases, hacker intrusions on the sites of information sources and further data substitution are possible.
However, this case is outside the scope of current work and is relevant to data transfer models [6]
    In general, many initiatives have been created to combat such news, even before the current
events, for example, there is the French law against the manipulation of information, which was
adopted to combat the discrediting of immigrants and the European Union after the announcement of
the Brexit results [7]. This law states that Platforms that exceed a certain number of visits per day
must have a legal representative in France and publish their algorithms, while any sponsored content
must be reported by publishing the author's name and the amount paid. The law also requires judges
to qualify fake news based on the following three criteria:
    •    transparency;
    •    deliberate distribution on a mass scale;
    •    leading to violations of public order or compromising election results.
    At the same time, the initial decision is made by a specially created ethics committee.
    The creation of a legal framework for the regulation of fake information is also observed in
Ukraine, for instance, there is Article 259 of the Criminal Code, which regulates responsibility for
knowingly false information about a threat to the safety of citizens.
    The global trend towards combating such information is generally positive, but the problem lies in
determining whether the information is fake or not. If the decision is based not on facts, but on expert
assessment, then such a situation can be considered manipulative. This situation is observed in the
Russian Federation, Iran, and the DPRK.
    For democratic countries, the process of identifying fake news can be automated using artificial
intelligence. Initially, naïve Bayes or SVM was used as the basic method, but with the development of
neural networks, convolutional and recurrent neural networks gained the most popularity.
    It is worth noting that in view of numerous studies, these models can give different results
depending on the subject area [8, 9]. Therefore, it was decided to consider the effectiveness of using
these models to identify fake news, to optimize their selection when building an appropriate software
system.

2. Domain analysis
   Firstly, the task of identifying fake news concerns the process of natural language
processing (NLP), therefore, before formalizing the purpose of this work, we will consider several
basic concepts that will be used in the future.
   Let's start with TF-IDF characteristics. Software systems are not able to directly process text
information, therefore, each text should acquire a quantitative form. There are several basic
techniques for this type of conversion, but currently, TF-IDF is the basic one. TF – short for term
frequency – is the frequency of each used word (more precisely, a set of words that are more
appropriate to call terms). IDF – short for inverse document frequency – is the inverse number of
terms per document. In general, the TF-IDF indicator indicates how rare a certain term is. For
example, interjections, conjunctions, or exclamations will be the most common and, accordingly, will
have a low TF-IDF.
   By itself, obtaining the TF-IDF characteristic allows only to rank certain terms, but not to convert
the text into a numerical representation. Word Embedding technology exists to perform this action.
which maps words or phrases into vectors of real numbers.
   The specified technology is a set of various methods, one of which is GloVe (Global Vectors).
GloVe is an algorithm for transforming unlabeled data (in our case terms) into continuous vectors to
reduce dimensionality [10]. GloVe vectors are pre-trained on data from Wikipedia and Gigaword 5,
so they capture the semantics of sentences quite well. It is worth noting that this algorithm is aimed
specifically at texts in English, not Ukrainian.
    Having clarified the key concepts, we can move on to how exactly fake news differs from real
news (without reference to architectural models of neural networks). For the sake of demonstration,
let's use the data set for the 2016 presidential election in the United States [11]. It contains 20015
news, of which 11941 are fake and 8074 are real. The real news in this set is data from well-known
authoritative news websites, such as the New York Times, Washington Post, etc. As you can see in
the figure below (see Fig. 1), they contain a lot of different information, including the title of the
article, text, image, author, website, and much more. In the future, we will only use the name and text.


Figure 1: Part of target dataset

    After finding TF-IDF characteristics, it was found that fake news headlines often contain words
like “notitle”, “IN”, “THE”, “CLINTON” and many unrelated numbers representing special symbols.
Several interesting conclusions can be drawn from this. First, most fake news stories don't have
headlines. These fake news stories are widely distributed in the form of tweets with several keywords
and hyperlinks to news stories on social media. Secondly, in fakes, most characters are in uppercase.
The goal is to grab the attention of readers, and real news has fewer capital letters and is generally
written in a standard format. Third, real news contains more detailed descriptions. For example,
names (Jeb Bush, Mitch McConnell, etc.) and verbs (left, claim, debate, poll, etc.). For a deeper
understanding let’s consider the problem of detecting fakes from the perspective of computational
linguistics, psychological positioning, lexical diversity, and sentiment analysis.
    The first thing to note is that fake news has fewer words on average than real news: 4,360 words
versus 3,943 (84 sentences versus 69). In addition, the range of words in fake news is much larger
than in real data. Despite fewer characters and sentences, the sentences themselves are shorter in fake
news. This is because editors and journalists use certain newspaper norms, which include the length
and choice of words, the absence of grammatical errors, etc. However, fake news is usually not based
on these rules.
    According to the obtained results, it can be noted that real news has fewer question marks than
fake news. The reason may be that fake news is rich in rhetorical questions that are used to
deliberately emphasize ideas and reinforce sentiments. In addition, it can be noted that both types of
news have very few exclamation points, and therefore exclamation marks, although their number is
still greater in fake news.
    Particular attention should be paid to the cognitive load expressed in the words of
opposition ("but", "without", "however") and denial ("no", "not"). True texts contain more objections
and oppositions. This can be explained by the fact that the creator of fake news should be more
specific and accurate and pay attention to appeals. This will also reduce the likelihood of self-
contradiction [12].
    It is also worth noting that first-person pronouns ("I" and "we") are less often used by criminals.
Such a state of affairs can be connected with the need to present false information from an objective
point of view. In this case, it is more appropriate to use impersonal sentences, or addresses of the
second and third person plural ("you", "they"). Some psychological studies indicate that such use of
pronouns is not only characteristic of fake news, but of people who tell lies in general. The reason is
that in this way people shift responsibility for deception from themselves to others [13].
    Lexical diversity, in general, is a measure of how many different words are used in a text, while
lexical density is a measure of the proportion of lexical items (i.e., nouns, verbs, adjectives, and some
adverbs) in a text. Considering the obtained data, we can note that there is more diversity in real news:
2.2e-06 versus 1.76e-06 for fake news. This can be explained by the fact that false information is
usually aimed at a less educated population that does not have developed critical thinking.
   The mood in real and fake news is significantly different: fake news is more negative. Several
reasons can be identified that would explain this phenomenon. First, the creation of false information
is aimed at inciting the population (it is worth noting that the current statement is valid for both
political and other spheres of activity), accordingly, the mood of the text should have a negative color.
In addition, some psychological studies indicate that this is caused by the author's inner sense of
guilt [14]. Having clarified the main nuances that can affect the course of the experiment, we
formalize the goal set in the introduction of the current work.
   Given a set of m text news, which can be represented as follows:
                                                      ,                                             (1)
   In the task of detecting fake news, it is necessary to predict whether there are articles with fake
news or not. At the same time, a set of labels indicating the veracity of the information can be
presented in the following form:
                                                      ,                                             (2)
   where 1 – real news, 0 – fake.
   The set of functions      з           should be obtained by parsing the text of the article, e.g. using
TF-IDF and Word Embedding.
   In this way, the following model of formation of news labels will be embedded in the selected
architectures :
                                                                                                    (3)

3. Mathematical representation
   After obtaining a mathematical representation of the problem and reviewing the necessary data
regarding the subject area, we will proceed to consider the selected architectures.

3.1.    Convolutional Neural Network
   In general, the convolutional neural network (see Fig. 2) is a rather unique type of neural network
that was primarily used to analyze graphical information. This uniqueness will be manifested in
several features:
   •     they have three dimensions: depth, height, and width;
   •     nodes in each of the layers are connected only to a small part, not to all;
   •     the result is collapsed into a set of probabilistic estimates grouped along the depth axis;
   •     to determine the descriptors, the network performs several operations of collapsing and
   combining;
   •     in addition to finding descriptors, data classification takes place.


Figure 2: Overview of convolution neural network [15]
    As can be seen from Figure 2, CNN consists of three types of layers: convolutional,
subdescritization, and perceptron. The main layer of a convolutional neural network can undoubtedly
be considered the layer of convolution, the work of which is the basis of this network. Its parameters
are filters that have small spatial dimensions (width and height) but pass through the entire depth of
the input data volume. For example, the standard filter of the first layer of CNN can be an
tensor. While passing through the network, the filter is slid over the width and height of the input
data, during which the scalar product between the filter entries and the input information is found. As
a result, a two-dimensional activation map is formed, which provides the response of this filter at each
spatial position. The number of similar activation cards is equal to the number of used filters.
    After the activation maps are formed, the activation function is output to the next layer. As an
example, which is often used to analyze textual information, we can cite the ReLu function:
                                                                                                     (4)
    For this network, three hyperparameters are considered, which control the size of the output
information. The first of them is depth. It is equivalent to the number of filters used in the network.
The second is the stride with which the filter performs the pass: the larger it is, the smaller the filter is
applied and the smaller the size of the output matrix. When passing the filter, there are situations
when it is convenient to fill data boundaries with zeros. The extent of this padding is the final third
hyperparameter. It’s worth mentioning last two important parameters – kernel size and the bias.
    In the case of considering the analysis of text information, the general principle of building the
architecture remains unchanged, only the text vectorization process is added.
    Unlike images, textual information is one-dimensional, so the number of convolutions is 1.
    Mathematically, in this case, the key layer of the CNN linear operation can be represented as some
linear layer                  for a window                            of size . At the same time, the
operation itself takes as input a concatenation of vectors:
                                                                                                     (5)
and performs multiplication by the convolution matrix:
                                                                                                     (6)
    Regarding the key parameters, the kernel size usually ranges from 2 to 5. It is not advisable to
choose Stride greater than 3 when examining the text, as this will affect the overall quality of the
information analysis, and considering not the subject area, this parameter is very important. Depth is
1, so adding zeros is only necessary if there is a non-unit step. Regarding the bias parameter, it is
usually not used, although such a possibility is provided.
    After the convolution is obtained, the union layer aggregates the features in a certain region. There
are several methods of pooling, however, when considering text information, the most popular is
max-pooling, which selects the maximum value of each filter, and it will be used later.
    Since news texts are sometimes quite long, it is necessary to apply many layers to them.
Unfortunately, in this case, there may be a problem with the propagation of gradients from top to
bottom through the deep network. To avoid this, you can use residual connections or a more complex
option – trunk connections.
    After the unification, the process of obtaining a probability distribution, which is usual for all
neural networks, takes place.

3.2.    Recurrent Neural Network
   Having considered the architecture of convolutional networks, let's move on to recurrent
networks (see Fig. 3). Among their general features, it is worth highlighting:
   •    a cyclic connection between the layers;
   •    a large number of modifications depending on the amount of data;
   •    a two-dimensionality;
   •    a short-term memory to consider previous data when creating an output.
Figure 3: Overview of recurrent neural network [16]

    Figure 3 depicts the recurrent boundary at each point in time, showing the presence of three layers:
input, hidden (where the main processing is done), and output. The learning process occurs by back-
expansion in time when new weights are obtained both as a result of current processing and due to
previous values. This process is formally a manifestation of feedback, which was noted when
considering CNN.
    The input data layer performs the initial processing of the received text information. It can include
tokenization, lemmatization, embedding words, etc. [17]. The hidden layer contains a rule according
to which input data is processed. And the output layer converts the result into the required format.
    The process of finding weight coefficients, which takes place in the so-called "hidden" layer, is
related to finding gradients based on input data. This feature, in turn, can lead to two problems – a
vanishing gradient and an explosive gradient. The first is related to large amounts of data or incorrect
initial configuration, the gradients determining the weighting coefficients can go to 0. As a result, the
network will stop the learning process. The second is the opposite of the first problem.
    To overcome these shortcomings, which can have a significant impact when considering fake
news, in the future, we will use not the classic architecture of the network, but its modification with
support for both short-term and long-term memory – LSTM. Such a move will allow analyzing text
information of uncertain volume more precisely. Below (see Fig. 4) is a schematic representation of
the operation of one of the steps of finding the weighting coefficients:


Figure 4: Overview of LSTM [18]
    Let's take a closer look at the indicated figure, and for a simplified understanding we will use the
original names of the elements.
    The first stage of the LSTM hidden layer is the Forget Box. Here, the two input weights are simply
added and multiplied by the activation function              so that the result is between 0 and 1.
Mathematically, this can be represented as follows:
                                                                                                      (7)
    where – input data,          – preliminary output data, – activation function.
    As we can see, the result of this addition is multiplied by the data from memory. That is why the
stage was named oblivion. If the input value turns out to be 0, then when multiplying by a memory
cell, we can successfully delete the value.
    The next stage is the Input Box, which is responsible for storing data in memory. The output range
of the sigma function is between 0 and 1, i.e. it is always positive. In this case, all values obtained as a
result of Forget Box will be written to memory. That's why the result is multiplied by the tangent,
which has an output range of -1 to 1. This way, some of the data is filtered out as insignificant.
Mathematically, the work of this stage can be presented as follows:
                                                                                                      (8)
    As a result of the operation of the two indicated stages, the state of memory is formed, below
      denotes the previous state of memory:
                                                                                                      (9)
    The received state of the memory, like the input data and the previous output, serve as the basis for
the formation of a new output state , which is the final stage of the execution of the model step:
                                                                                                     (10)
    Now we can move on to setting up an experiment that will be conducted to test the effectiveness of
each type of architecture.

4. Experimental environment
   The experimental environment in the current study is the following set of characteristics:
   •   efficiency function;
   •   rule for comparing the efficiency of two models;
   •   experimental plan.
   We will gradually determine each of the specified characteristics.

4.1.    Efficiency function
   The efficiency of a certain model (E) will be determined as follows:
                                                                                                    (11)
   where – training speed; MSE – mean square error of the forecast; – data volume.
   The performance indicator can be determined using third-party modules or software after
implementing the models. For example, the time module or software systems Postman, JMeter.
   To determine the accuracy of the forecast, we will use special samples with data on the Russian
invasion of Ukraine [19] and the election of the president of the United States in 2020 [20]. The
specified data will be divided according to the Pareto principle into two groups in the ratio of 80 to
20. 80% will be used as input data for forecasting. 20% will instead serve as the real value. After
classification, to find the root mean square error, it is enough to use the following formula:


                                                                                                    (12)
                                                          ,
   where N – number of classified values; – real value; – classified value.
   As the volume of data, we will consider the time spent by the user to prepare the model for use. At
the same time, to correctly compare the two models, the inverse time indicator will be considered.
4.2.    Efficiency comparison rule
   To compare the efficiency of the two models, we introduce the variable C, which is determined by
the following formula:
                                                                                              (13)
                                                             ,
  where      – metric value for model А;       – metric value for model В.
  After obtaining the results for each model, it is possible, using a fuzzy parameter, to determine
which model is more effective:
  •              : model A is more efficient than В;
  •              : model А is less efficient than B;
  •                     : it’s impossible to determine whether there is one model more efficient than
  another.

4.3.    Experimental plan
    Numerical data must be collected for all metrics of each model, so a controlled experiment method
is used. As a result, a stable and permanent execution environment was chosen – a physical device
based on Ubuntu, the technical specifications of which are given below:
    •    CPU: Intel Core i5-1135G7;
    •    RAM: 16 Gb;
    •    SSD: 512 Gb;
    •    VRAM: 4 Gb;
    •    OS: Ubuntu 21.04.
    To determine the performance, each model is implemented using Python 3. To avoid problems
with manual testing and simplify the debugging process, it was decided to use the Jupyter software
environment and the time module for performance evaluation.
    As mentioned earlier, two test samples are used to determine accuracy. This data is divided into
two parts in the ratio of 80/20. The accuracy result obtained as the ratio of the obtained class and the
real one for two samples is aggregated.
    For a better understanding, we present a fragment of the data for each sample. Let's start with data
on the invasion of the Russian Federation into the territory of Ukraine (see Fig. 5):


Figure 5: Part of first dataset

   This dataset is described by the following fields:
   •    id: unique id for a news article;
   •    date: news publication date;
   •    text: the text of the article; could be incomplete;
   •    owner-id: id of the author of the post;
     •    from-id: the identifier of the author of the message;
     •    post-type: since the data is information from social networks, the indicator indicates whether
it is a reply or an actual post;
     •    attachments: additional images for news;
     •    marked-as-ads: a label that marks the article is unreliable: 1 – unreliable, 0 – reliable.
     The data on the presidential elections in the USA differ in content and represent the following
information (see Fig. 6):


Figure 6: Part of second dataset

    This dataset is described by the following fields:
    •     id: unique id for a news article;
    •     title: the title of a news article;
    •     author: author of the news article;
    •     text: the text of the article; could be incomplete;
    •     label: a label that marks the article as potentially unreliable: 1 – unreliable; 0 – reliable.
    First, it is necessary to remove redundant fields from the datasets. In fact, the text and the fake
marker are the main ones in the current study.
    It is worth noting that the number of fake news for each of the datasets is about 50%. In addition,
each sample will first be divided into test and training samples in the ratio of 20% to 80%, and then
the training sample will be dynamically divided into validation and actual training samples. Each
training epoch, validation will take place on 10% of the total training sample.
    The final indicator for analysis is the speed of data preparation. In this case, the indicator depends
solely on the person performing the test.
    The following errors and uncertainties can be determined for this experiment:
    •     when checking the speed of work, errors and errors related to the time execution time
estimation module and the Jupyter software environment should be expected;
    •     when checking the accuracy of the models, problems may arise with the data used as a test
sample, as they directly affect the obtained result;
    •     when checking the speed of data preparation for analysis, two main problems can be
identified: the human factor and the error of the time measurement tool.

5. Models implementation
First, all non-alphanumeric characters (such as numbers, commas, periods, and other punctuation
marks) are removed from the text using the re library, which provides access to regular expressions.
After cleaning, the text goes to the processing function, its code can be seen below (see Fig. 7):
Figure 7: Part of code for text processing

    Central to this feature are the methods provided by the nltk library, which allows natural language
processing based on built-in word corpora.
    As a first step it is necessary to remove from the set those words that do not carry an informational
load (for example, "and", "or", etc.). These words will interfere with the correct analysis.
    The next step is to level the linguistic variability generated using morphemes. For this, the
operation of stemming (from English stemming) is used – the process of reducing a word to its base.
For example, the words “eating” and “eaten” will be replaced by the word “eat”. After this
processing, the array of words is reduced to a set of bases.
    The class built into the nltk library, which performs the specified process, does not always work
correctly, therefore, to improve the result, it was decided to combine stemming with another operation
– lemmatization (from English lemmatisation). It allows you to bring the word form to the lemma
(normal dictionary form). For example, when processing the words "good" and "better" will have
different bases, but these words have the same lemma – "good". After carrying out the specified
actions, the set of words needs another check for non-informative content.
    After processing the text, a dictionary is created that is necessary for the correct definition of TF-
IDF characteristics. In order to take into account the emotional coloring of words, the next step is to
find a frequency-polar characteristic for each word, by multiplying the TF-IDF and polarity
indicators. The latter can be obtained by using the SentimentIntensityAnalyzer class found in the
vader module of the nltk library.
    After that, it is necessary to sum up the obtained multiplications and get a single result for each
news entry.
    It is worth noting that in the two cases the languages of the news are different (English for the
election sample and Russian for the war sample), so it is necessary to use different submodules for
stop words..

6. Experiment results
   Before directly comparing the efficiency of the two algorithms, we note that because of the
analysis of the ratio of loss with the number of epochs, it was established that the optimal number of
epochs for CNN is 3 epochs, and for LSTM – 4 epochs.
  Let's move on to the definition of efficiency parameters and start with training speed
measurements using the time library. The results are shown in the table below (see Table 1):

Table 1
Training speed of neural networks
           CNN                                                RNN
           305 s                                              325 s
           294 s                                              311 s
           311 s                                              336 s
           290 s                                              301 s
           301 s                                              316 s
           303 s                                              319 s
           307 s                                              334 s
           297 s                                              310 s
           303 s                                              323 s
           299 s                                              317 s

   Let's find the average value for each of the algorithms. For CNN, we have a time value of about
301 s, for RNN – ~319 s. Let's move on to the next indicator – classification accuracy. As mentioned
above, the accuracy of the forecast is measured using the MSE indicator. Below we present the
accuracy of the obtained results for both samples (see Table 2):

Table 2
Accuracy
      Name                    CNN                                       RNN
       War                    93%                                       94%
     Election                 94%                                       96%

   The last indicator is the time of data preparation of architectural models. The measurements results
for this indicator are shown in the table below (see Table 3):

Table 3
Time of data preparation
           CNN                                                RNN
           419 s                                              421 s
           382 s                                              378 s
           359 s                                              356 s
           395 s                                              398 s
           401 s                                              403 s

    Let's find the average value of the indicator for each of the algorithms. We have ~391 s for CNN
and ~391 s for RNN.
    Having found the appropriate metrics for comparing the efficiency of the two algorithms, we will
move on to finding the CAB indicator. We will consider CNN as model A, and RNN as model B. We
have CAB ~ $2.83. This allows us to state the higher efficiency of RNN versus CNN, although this
difference is not significant. While giving gains in training speed, CNN loses in prediction accuracy.
In the case of the implementation of algorithms within the framework of a cloud platform, which
makes it possible to significantly accelerate the speed of work, the resulting difference in speed
between the models will be insignificant. However, it is worth noting that accuracy may vary
depending on the data and its volume.
7. Conclusion
    The current work aimed to investigate the effectiveness of using neural network architectures such
as CNN and RNN to detect fake news. For this purpose, the subject area associated with false
information was analysed and key patterns characterizing similar data were identified. In addition, the
theoretical background of both types of models was analysed. During the investigation, it was found
that standard models cannot be effectively applied for text classification, so it was decided to keep in
mind the following provisions for further consideration:
    •    CNN should be understood as a convolutional neural network with a single-layer convolution
    and the appropriate adjustment of all its parameters;
    •    under RNN, understand an LSTM network configured for text analysis, which has both long-
    term and short-term memory.
    The next step was to define the experimental environment. As test samples, it was decided to
choose data related to the Russian invasion of the territory of Ukraine and the presidential elections in
the United States in 2020.
    As a result, it was discovered that the CNN model works faster than RNN on the presented data,
but gives a less accurate classification result. Given the proposed efficiency comparison rule, the
stated probabilities of errors of various kinds, and ways to overcome the disparity between algorithms,
the resulting efficiency gains can be considered insignificant.
    This conclusion corresponds to the world scientific practice, which recommends that during the
analysis of textual information, especially, in the presence of two classes (fake and non-fake data in
the current case), use any of the proposed models, or their congregation in the case of checking
images for authenticity. This, in turn, proves the feasibility of using similar algorithms for news
filtering systems, for example as part of a social network or search engine.
    As further goals for deepening the research, it is proposed to consider more broadly the
possibilities of combining recurrent and convolutional networks.

8. References
[1] "fake news." Cambridge Dictionary. https://dictionary.cambridge.org/dictionary/english/fake-
    news (accessed Feb. 8, 2023).
[2] Polygrpaph. "The Infodemic: Did China Deliver a COVID-19 Vaccine to Africa?" Voice of
    America. https://www.voanews.com/a/covid-19-pandemic_infodemic-did-china-deliver-covid-
    19-vaccine-africa/6187618.html (accessed Feb. 8, 2023).
[3] L. Chinchilla. "How Does the Age of Fake News Impact Democracy in the Developing World?"
    Kofi Annan Foundation. https://www.kofiannanfoundation.org/supporting-democracy-and-
    elections-with-integrity/annan-commission/post-truth-politics-afflicts-the-global-south-too/
    (accessed Feb. 8, 2023).
[4] D. Avelar. "WhatsApp fake news during Brazil election ‘favoured Bolsonaro’." The Guardian.
    https://www.theguardian.com/world/2019/oct/30/whatsapp-fake-news-brazil-election-favoured-
    jair-bolsonaro-analysis-suggests (accessed Feb. 8, 2023).
[5] K. Wesolowski. "Fake news further fogs Russia's war on Ukraine." Deutsche Welle.
    https://www.dw.com/en/fact-check-fake-news-thrives-amid-russia-ukraine-war/a-61477502
    (accessed Feb. 8, 2023).
[6] I. Afanasieva, N. Golian, O. Hnatenko, Y. Daniiel, and K. Onyshchenko, "Data exchange model
    in the Internet of Things concept," Telecommunications and Radio Engineering, vol. 10, no. 78,
    pp. 869–878, 2019.
[7] A. Blocman. "Laws to combat manipulation of information finally adopted." IRIS Merlin.
    https://merlin.obs.coe.int/article/8446 (accessed Feb. 8, 2023).
[8] V. Golian, N. Golian, I. Afanasieva, K. Halchenko, K. Onyshchenko, and Z. Dudar, "Study of
    Methods for Determining Types and Measuring of Agricultural Crops due to Satellite Images,"
    in 32nd International Scientific Symposium Metrology and Metrology Assurance, Sozopol,
    Bulgaria, Sep. 7–11, 2022.
[9] P. S. Reddy, D. E. Roy, P. Manoj, M. Keerthana, and P. V. Tijare, "A Study on Fake News
     Detection Using Naïve Bayes, SVM, Neural Networks and LSTM," Journal of Advanced
     Research in Dynamical and Control Systems, vol. 11, no. 06, pp. 942–947, 2019.
[10] J. Pennington, R. Socher, and C. D. Manning. "GloVe: Global Vectors for Word
     Representation." Stenford Edu. https://nlp.stanford.edu/projects/glove/ (accessed Feb. 16, 2023).
[11] M. Risdal.         "Getting         Real         about        Fake          News."         kaggle.
     https://www.kaggle.com/datasets/mrisdal/fake-news (accessed Feb. 16, 2023).
[12] A. Dey, R. Z. Rafi, S. Hasan Parash, S. K. Arko and A. Chakrabarty, "Fake News Pattern
     Recognition using Linguistic Analysis," 2018 Joint 7th International Conference on Informatics,
     Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision &
     Pattern Recognition (icIVPR), Kitakyushu, Japan, 2018, pp. 305-309.
[13] D. K. Sharma, P. Shrivastava and S. Garg, "Utilizing Word Embedding and Linguistic Features
     for Fake News Detection," 2022 9th International Conference on Computing for Sustainable
     Global Development (INDIACom), New Delhi, India, 2022, pp. 844-848.
[14] X. Jose, S. D. M. Kumar and P. Chandran, "Characterization, Classification and Detection of
     Fake News in Online Social Media Networks," 2021 IEEE Mysore Sub Section International
     Conference (MysuruCon), Hassan, India, 2021, pp. 759-765.
[15] S. Saha. "A Comprehensive Guide to Convolutional Neural Networks - the ELI5 way." Towards
     Data Science. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-
     networks-the-eli5-way-3bd2b1164a53 (accessed Feb. 23, 2023).
[16] A. Trujjillo. "LSTM Neural Network for Time Series Forecasting." RPubs.
     https://rpubs.com/atrujill/717267 (accessed Feb. 23, 2023).
[17] IBM Cloud Education. "Natural Language Processing (NLP)." IBM Cloud Learn Hub.
     https://www.ibm.com/cloud/learn/natural-language-processing (accessed Feb. 20, 2023).
[18] M. West. "Explaining Recurrent Neural Networks." bouvet. https://www.bouvet.no/bouvet-
     deler/explaining-recurrent-neural-networks (accessed Feb. 23, 2023).
[19] DS-Pr1nce. "War in Ukraine: Russian social network discussions." kaggle.
     https://www.kaggle.com/datasets/ustyk5/war-in-ukraine-russian-social-network-discussions
     (accessed Feb. 23, 2023).
[20] R. Dedhia. "Fake News." kaggle. https://www.kaggle.com/datasets/ronikdedhia/fake-news
     (accessed Feb. 23, 2023).