<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>November</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Using Parallelized Neural Networks to Detect Falsified Audio Information in Socially Oriented Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergiy Yakovlev</string-name>
          <email>sergiy.yakovlev@p.lodz.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Khovrat</string-name>
          <email>artem.khovrat@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymyr Kobziev</string-name>
          <email>volodymyr.kobziev@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>14, Nauky, Ave., Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lodz University of Technology</institution>
          ,
          <addr-line>90-924 Lodz</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>2</volume>
      <fpage>0</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>The problem of information counterfeiting with technology development is becoming more acute and can provoke social unrest. Modern algorithms can automatically create videos, audio, and text data, and more people are not able to reveal the fact of deception in them. This is especially true of audio information in socially oriented systems. Different mathematical models and concepts can be used to counteract this problem. The most effective of these are derivatives of convolutional and recurrent neural networks. However, the complexity of the data and its volume determine the problem of slowness of the classification. One of the approaches to solving this issue is the use of parallelism. With the framework of the current study, given the simplicity of monitoring, it was decided to focus on MapReduce technology. In the course of this investigation, the features were characteristic of audio information both in its text form and in the form of a signal. The possibilities for the rapid process of data and their augmentation using vector autoregression algorithms are determined. Based on the analysis of international experience, it was decided to use hybrid neural networks based on convolutional and recurrent networks. Several experiments were conducted to determine the effectiveness of the proposed approaches and the expediency of using MapReduce. The results obtained suggest that the most effective algorithm is the combination of two selected architectures and that it is advisable to use the specified parallelization technology to accelerate the selected models. This fact shows the possibility of practical application of the proposed approach to detecting falsified audio. Classification, fake information, natural language processing, neural networks, parallelization, socially oriented systems Proceedings</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent decades, technologies capable of falsifying information have reached the level where the
need to detect forgeries in socially oriented systems is being discussed at the legislative level [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. At
the same time, the degree of acuteness of the problem is diversified depending on the type of relevant
data. In particular, in the case of video information, the distortion has not yet reached the required level
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the case of text and photos, there are already quite thorough studies and even certain
implementations of them to determine the fact of forgery [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. At the same time, audio falsification
has only recently been able to go beyond simple identification, that is, using human hearing. This state
of affairs allows ordinary citizens to confuse a fake record with a real one.
      </p>
      <p>
        In normal conditions, the problem can give rise to local conflicts in a group of people, this is
especially noticeable in social networks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The situation becomes more acute in conditions of military
and political instability when any information is perceived through the prism of intensified emotions,
which slow down the process of critical thinking. In the case when fakes are part of the news
information field, they can accelerate social shifts caused by the crisis and thus intensify its
      </p>
      <p>2023 Copyright for this paper by its authors.
CEUR</p>
      <p>
        ceur-ws.org
consequences [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. It can be economic, purely social, or even military in nature, negatively affecting the
mood of the population. As an example of a similar situation: fake audio related to the COVID-19
pandemic [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], or a huge number of falsified recordings at the beginning of the invasion of the Russian
Federation into the territory of Ukraine [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which were used to hide the facts of violations of the laws
of war or discredit the Armed Forces of Ukraine. It should be noted that, although the possibilities for
high-quality forgery of audio data appeared relatively recently. They are already much easier to
implement, unlike a photo, which requires a large amount of source material and a sufficient amount of
time. This, in turn, only increases possible risks for society and the state.
      </p>
      <p>
        The state of affairs became the basis for the fact that more and more countries are discussing the
fight against such information, although, mostly, they pay attention only to its textual component.
Instead, the issue of audio alteration detection is quite open, although partially addressed, especially
when it comes to the addition of real information, rather than synthetic generation. Therefore, it was
decided to apply known methods of determining the fact of forgery of text or photo data for any type
of audio. Based on world practice, neural networks, in particular convolutional and recurrent, are the
most effective. However, it should be noted that they require significant amounts of initial information
as well as time for training the model. To solve these problems, you can use the principles of
augmentation of audio materials and parallelism, respectively. However, classical forms of
parallelization do not guarantee a significant gain in speed during model training [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. As a basic
alternative to these principles, MapReduce technology is used, which, in addition to training, allows
you to speed up the process of determining the fact of falsification. As part of the current work, a review
of models of fake audio classification based on neural networks and the possibility of modifying these
algorithms and means of their parallelization will be carried out. The purpose of the work: development
of an effective model for determining the fact of falsification of sound data, using MapReduce
technology. To achieve this goal, the following list of tasks has been defined:
 determine the features of audio in socially oriented systems;
 analyse the international experience of determining the fact of falsification for various types of
information;
      </p>
      <p> based on the conducted analysis, form a set of limitations and assumptions, and define models
that will be used in further research;</p>
      <p> carry out a description of algorithms that would allow the reprocessing of audio information
both in the form converted to text and in the form of a signal;
 review selected architectures of neural networks and determine their main hyperparameters;
 reveal the essence of MapReduce technology and describe the implementation of the proposed
approach;</p>
      <p> form an experiment plan and a multi-criteria selection task that would allow determining the
expediency of using MapReduce and choosing the most effective classification algorithm;
 analyse the results of the experiment and form appropriate conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Domain analysis</title>
      <p>First of all, it should be noted that the audio can be falsified in different ways, in particular:
 synthetic creation of audio with the help of artificial intelligence: takes place in the presence of
a large amount of source material and the need to obtain the voice of a specific person;
 composition of existing sound tracks to distort the essence of the original information: unlike
the previous case, generation algorithms may not be used here, but the need to obtain the voice of a
specific person remains;</p>
      <p> contextual distortion: if the owner of the voice on the audio is not important, fake news or
announcements may be recorded on the audio tracks.</p>
      <p>
        Regardless of the nature of the falsification, all cases focus on changing the context to obtain the
desired result: deterioration of the mood of the population, blackmail, etc. With this in mind, the
following list can be made of how falsified audio can be detected and classified based on the information
in the middle of the message:
 using an unnatural number of rhetorical questions (contextual distortion of socially important
information). Conducted linguistic studies indicate that in the official, business, and journalistic styles
to be used by mass media, this type of speech construction is rarely used [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This feature is typical
for both text news and audio and video;
      </p>
      <p> absence of negative constructions to reduce cognitive load in combination with the pessimistic
coloring of the chosen words. As an example, the replacement of the word "problem" with "catastrophe"
can be given. However, it should be noted that during oral speech, the use of profanity does not allow
one to determine the nature of the color with certainty, therefore, for further analysis, the evaluation of
such words will be formed based on the context;</p>
      <p> using appeals and incentives in inappropriate contexts. It should be noted that in the case of
contextual distortion, which aims to replace real news, such constructions immediately indicate the
falsity and incorrectness of the specified information. However, when looking at mimic recordings,
these features may be characteristics of the speech process of the attacker's target;</p>
      <p> use of an unreasonable number of pronouns. This factor mostly has room for contextual
distortion, which imitates the journalistic style of presentation.</p>
      <p>
        The specified characteristics are not exhaustive, but it should be emphasized that in the case of
processing of audio recordings transformed into text form, some features may not manifest themselves.
In particular, it is known that when creating fake news, short sentences and words are often used, or
they contain a large number of various errors [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. These and similar characteristics will not be taken
into account further, since they may appear due to incorrect audio recognition or generally be a special
part of the speech process of people present on the audio recording (for example, when mixing two
languages or using regional dialects). These same features can lead to receiving a greater number of
false positive cases when detecting fakes.
      </p>
      <p>
        In general, the field of determining the falsity of information is not new, as already noted above. In
the study of fake text news by several groups of Spanish scientists [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], it was shown that machine
learning by its very nature requires a sufficiently large amount of information to achieve a positive
(accuracy of more than 95%) classification result, besides, it is quite sensitive to data outliers. However,
taking into account publicly available databases, the specified shortcomings are not significant, as was
shown in the work of Ukrainian researchers [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. A similar justification can be applied to other types
of information, including audio. This was demonstrated in their work by researchers from the
Massachusetts Institute of Technology [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Regarding other methods of detecting falsified information, another quite popular method is the
creation of graph models, which was extensively studied by scientists from Harvard [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] on the example
of fake accounts and guarantees a quick result with minimal basic data. However, when applied to audio
or text information, the method will require significant refactoring and thus the gain in speed is leveled
off. For the issue of detecting fake data it is worth mentioning the problem of detecting spam. A group
of Chinese-American scientists proved the possibility of effective application of Markov networks [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
However, given the specificity of the area, their application is quite cumbersome and will require
significant computing power, which was studied by Canadian scientists from Montreal [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. When
considering visual information, such as a photo or video, you can apply autoregressive models to detect
deviations from the basic values, in other words, to detect the fact of data manipulation. This approach
was successful.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Mathematical representation</title>
      <p>When considering audio information, there are two main approaches of processing – considering the
data as a signal and considering it as a text record. Start the presentation of the proposed approach from
the second approach.
3.1.</p>
    </sec>
    <sec id="sec-4">
      <title>Analysis of audio as text</title>
      <p>Based on the analysis of the work of a group of Chinese scientists, it was determined that the creation
of an own Speech to Text module is accompanied by the following problems:
 quality of recordings for training;
 lack of data for model formation (the problem is especially acute for languages with small
corpora);
 ignoring pronunciation defects;
 correct processing of dialectics, neologisms, abbreviations, etc.</p>
      <p>The specified list is not complete, so in order to avoid the specified problems and achieve the best
result of converting audio to text, it was decided to use Google Speech to Text, in particular, its
appropriate wrapper for the Python3 programming language. Additionally, with the help of voices
recorded by 20 people from different regions of Ukraine and with various speech disabilities, as well
as 20 recordings from Ukrainian-language films, it was established:
 the system has limited capabilities in recognizing abbreviations;
 if the pauses between words are too long, the module will consider them as separate sentences;
 the quality of recordings does not significantly affect the quality of recognition: the presence
of additional noise is leveled at the expense of the audio remastering stage;</p>
      <p> without taking into account the above, the accuracy of recognition reached more than 95%, the
exceptions were the rural dialects of the western and eastern regions.</p>
      <p>These statements are limitations of the current work.</p>
      <p>To convert the received text information into a numerical representation, the following algorithm
will be used:
 break the text into sentences and separate words;
 remove words without a significant lexicographic load and those that do not affect the results
of the algorithms (so-called stop words). For example, "however", "this", "or", "etc";
 form a text dictionary based on the received words;
 separate the bases of each word in the dictionary and remove repetitions (perform a stemming
operation);</p>
      <p> define a lemma for each word in the dictionary and remove repetitions (perform
lemmatization);</p>
      <p> determine the frequency characteristic of each word (its description will be made below) and
its emotional colouring using Sentiment Analysis tools built into the nltk model of the Python3
programming language;</p>
      <p> modify the assessment of emotional colouring within each separate sentence based on the rules
established earlier;</p>
      <p> aggregate and normalize the frequency-emotional indicator for each sentence, it will serve as a
target indicator for further use of neural networks;</p>
      <p> find the index of suspiciousness in a normalized form, based on a list of words that are often
used in fake information.</p>
      <p>In addition to the indicators, the following will also be used as input values:
 frequency-emotional characteristics of the 50 most popular news for the date of creation of the
audio recording. This indicator will allow to take into account the external news environment and adjust
the classifier's assessment accordingly;</p>
      <p> message weight. It was decided to create a data set in which the audio will be divided into 4
groups: fragments of home dialogue, general news, information from the place of emergency events,
news of special importance. Marking will be carried out from 1 to 4, respectively. The value of the
weight indicator will also be normalized;</p>
      <p> degree of reliability of audio to text conversion. It will be determined by comparing the actual
text of the message with what was processed by Google Speech to Text, as the ratio of correctly
recognized words.</p>
      <p>Regarding the frequency response, it was decided to use BM25, which is a certain modification of
TF-IDF. The purpose of TF-IDF is to count the importance of each word in the query and text,
considering the frequency of use of the term both in a particular document and in the corpus as a whole.
Conventionally, the word "and" may be one of the most used in a particular sentence, but it is often
used in the corpus as a whole, so it will have less significance when searching this corpus. The method
is based only on statistics and is calculated quite quickly, so it is still popular for problems where more
complex solutions are not required. In the case of BM25, the saturated term frequency is added to the
TF. That is, if the term already has a high frequency, then after a certain mark the increase in frequency
will not have a significant effect on the TF estimate. The IDF is used in the same way. In addition, two
parameters, parameters k1 and b, are used, which can be adjusted for a specific case. k1 is responsible
for frequency saturation, and b is for the measure of the influence of the length of the document on the
results.</p>
      <p>
        The choice of TF-IDF was determined by the results of previous studies devoted to the analysis of
textual news [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Inaccuracies and limitations reduced the effectiveness of the proposed classification
methods. In particular, one of the problems was the oversaturation of certain terms that cannot be
considered stop words, for example, "catastrophe".
3.2.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Analysis of audio as signal</title>
      <p>The first step in preparing audio for processing as a signal is to separate the vocalized part of the
signal from the silence. Such an operation is necessary, because the very first part contains the key
elements of human speech. One of the commonly used ways of marking an audio signal is to divide it
into three states:
 area of silence (S), where there is no pronunciation;
 non-vocalized region (U), where the resulting waveform has an aperiodic or random character
(occurs when the vocal cords do not vibrate);</p>
      <p> vocalized region (V), where the resulting waveform is quasi-periodic (occurs when the
speaker's vocal cords are tense and, accordingly, vibrating).</p>
      <p>Regarding the combination of the first two areas indicated above, it is worth noting that there are
indeed methods that would allow us to separate silence from non-vocalization, but they require constant
reconfiguration for different environments, which in the context of audio news is an inefficient
procedure. A similar problem exists for methods that separate the vocalized region from others based
on low energy. Therefore, taking into account the statement and the fact that the noisy environment of
news can vary, but is relatively stable, it was decided to apply methods based on distribution. At the
same time, the signal has a Gaussian nature. Thus, to isolate the necessary part of the audio, you can
use the Mahalanobis distance function, which is a linear pattern classifier (LPC).</p>
      <p>To determine the parameters of the Gaussian distribution, it is necessary to determine the base
window. For this purpose, it was decided to apply the method of expert evaluation. 30 sound processing
specialists from Kharkiv, Kyiv, Dnipro and Vienna were interviewed. It is established that the optimal
window size is 200 ms. The Gaussian distribution for the one-dimensional case is determined via the
following formula:
where μ – mean, σ – standard deviation.</p>
      <p>Given the above, the following set of rules can be defined:
The Mahalanobis distance can be determined as follows using the formula:</p>
      <p>Taking into account (2) and (3), it can be established that with a probability of 99.7% the distance
will be less than 3.</p>
      <p>The process will take place as follows:
 the algorithm performs a gradual analysis of audio in a window of 200 ms and determines the
standard deviation and the average value;</p>
      <p> for each subsequent window, the Mahalanobis distance is calculated using the previously
obtained values;</p>
      <p> if the distance exceeds 3, then the sample to be vocalized, otherwise, it can be replaced with
empty values, actually deleting it.</p>
      <p>The neural network will receive the transformed audio as input, but this time the sampling window
will be determined by cross-validation. As mentioned above, to avoid the problem of lack of audio
signals, it was decided to carry out the augmentation process using autoregression. Formally, it can be
submitted as follows:
,
(4)
where yt – N-dimensional time series;</p>
      <p>– non-degenerate matrices of autoregression coefficients
of dimensionality N×N, , .</p>
      <p>Here it is worth noting that since the matrix of coefficients is not degenerate, it is easy to normalize
them in the range from 0 to 1. Model (4) can be used in the process of considering short-term periods,
in addition, it is necessary to guarantee the absence or insignificance of external influence. Since within
the scope of this work, it was decided to consider the medium-term perspective, it is necessary to find
the delta between (4) and the forecast for the previous period :
where
;
;</p>
      <p>The resulting matrix of coefficients, subject to the total number of unknowns that must be taken into
account during generation, may vary depending on the selected model. Within this study, the following
variations will be considered:
 simple vector autoregression;
 seasonal vector autoregression;
 vector autoregression of the distributed lag;
 vector autoregression of the moving average;
 vector autoregression of the integrated moving average.</p>
      <p>In this case, the accuracy of data augmentation can be considered the only factor of effectiveness.
In fact, it is necessary to ensure that the distribution between vocalized and unvocalized 200ms samples
does not change.
3.3.</p>
    </sec>
    <sec id="sec-6">
      <title>Architectures of neural networks</title>
      <p>The chosen BiLSTM and CNN-RNN architectures are inherently descendants of classical
convolutional and recurrent neural networks.</p>
      <p>RNN consists of several hidden layers that work one after the other. At the same time, each
subsequent layer at the input receives the result of the work of the previous one. The specified feature
is called short-term memory by analogy with the human brain.</p>
      <p>Schematically, the architecture of a recurrent neural network can be presented in the form shown in
Figure 1.</p>
      <p>Inside the hidden layer, the output data is gradually processed using gradients. However, if the data
is specific and is not inherently bounded (either above or below), exploding or vanishing gradient
problems may arise. These are situations when the gradient value starts heading towards infinity and 0,
respectively. Based on international experience, when analyzing text information (and signals as well),
this problem can occur. To overcome this shortcoming, it was decided to use neural networks with
support for short-term and long-term memory (LSTM).</p>
      <p>The essence of the mathematical apparatus in LSTM consists in the gradual use of several sigmoids
and hyperbolic tangents, which allow you to adjust the values in such a way as to avoid the direction
both to 0 and to infinity.
,
(5)</p>
      <p>At the first stage, Forget Gate adds two input weight data and multiplies by the activation sigma
function. The range of values of this function limits the result of the execution of the specified step in
the range from 0 to 1. After completion, the result is multiplied by the data from the long-term memory
channel.</p>
      <p>The second stage is the Input Gate, which performs a similar multiplication by a sigma function.
However, this time the result is corrected by applying a hyperbolic tangent. Such an operation allows
you to eliminate the problem of going to infinity when adding to a value from a long-term memory
channel.</p>
      <p>As a result of the operation of the two specified stages, a state of memory is formed, which, together
with the input and previous output data, serves as the basis for the formation of a new value of
shortterm memory. This stage is called Output Gate and is the final step of one hidden layer. The essence is
to find the hyperbolic tangent from the long-term memory value, which is then multiplied by the sigma
function from the previous value in the short-term memory.</p>
      <p>The steps described above can be presented as shown in Figure 2.</p>
      <p>Although this architecture avoids the problem of gradients, it still has one significant drawback when
processing natural language – the impossibility of taking into account the future context.</p>
      <p>Let us have the beginning of the sentence "Apple is something that...". A conventional LSTM
architecture will not be able to determine what exactly is meant by "Apple" - a fruit or a company,
because it does not have information about the end of this sentence. For her, the options are "Apple is
something that competitors simply cannot reproduce." and "Apple is something that I like to eat." are
idempotent. To avoid this problem, it was decided to use a bidirectional recurrent neural network with
support for long- and short-term memory (BiLSTM). The essence of this network is the combination
of two LSTMs that are directed in different directions. In this case, the "auxiliary network" allows you
to take into account the context for the beginning of the sentences.</p>
      <p>After working out two subnets, the result of both levels is combined, first by simple concatenation,
and then by linear transformations. To determine the relevant operations. It was decided to conduct
cross-validation. During which it was established that the best result is achieved when using averaged
values. A similar conclusion was obtained during an expert evaluation among 10 people engaged in
natural language processing. Schematically, the architecture of a bidirectional recurrent neural network
with long-term and short-term memory support can be presented in the form shown in Figure 3.</p>
      <p>Compared to the previous case, CNN-RNN architecture has neither short-term nor long-term
memory. Instead, it uses a convolution layer, which allows you to significantly reduce the
dimensionality of the output data. It is especially effective in pattern recognition and determining the
fact of falsification of images or videos.</p>
      <p>In order to be able to build the most effective CNN architecture, it is necessary to set a number of
hyperparameters of the model. One of the most important among them is the size of the filter. This is
such an element of the hidden layer that performs data traversal and convolution. After cross-validation,
it was determined that the 555 dimension filter is best for the chosen case.</p>
      <p>Here it is worth noting that the last value of the dimension in our case will correspond to the number
of descriptors that make up the target variable. That is why for the analysis of audio as text, the
dimension will be 555, but in the case of analysing audio as a signal, authors will consider only the
standard deviation and the average value as descriptors, so the dimension of the filter will be 5 52.</p>
      <p>When it passes through the data, there is a scalar product between the filter entries and the input
information. This will make it possible to form an activation map, the size of which will be equal to the
number of used filters - in other words, the depth of the neural network. Based on the number of factors
that will be taken into account during classification, it was decided to stop at a depth of 5 and 2 ,
respectively</p>
      <p>In addition to the specified hyperparameter, the following characteristics are considered for the
current architecture:</p>
      <p> size of the kernel (during cross-validation, a kernel ranging from 2 to 5 was used and the
optimal value was set to 4);</p>
      <p> step size during consideration. Based on recommendations that indicate the undesirability of
using a step greater than 3 for text, an optimal value of 1 was determined;
 based on the set step, the option of adding insignificant zeros will not be applied;
 based on the specifics of the subject area, it was decided not to apply the displacement
parameter.</p>
      <p>It should be noted that the number of convolutional network layers for textual information analysis
must be equal to 1</p>
      <p>The schematic architecture can be represented as shown in Figure 4.</p>
      <p>The problem of the specified architecture when using it for processing natural languages is the
limitation in taking into account the context. Usually, passing the filter allows you to take into account
the context of each word, but the specificity of the Ukrainian language lies in large sentences. Thus, the
salient context may be outside the CNN model's filter. To avoid this problem, it was decided to combine
RNN and CNN. Although there are several ways of such a combination, the current work will consider
only the RCNN architecture, which sequentially uses two neural networks. In other words, after
convolution, the result is not only concatenated, but sent to the recurrent neural network layer.</p>
      <p>To avoid the above-mentioned problems when considering the text, it was decided to use not the
classic RNN architecture, but the above-mentioned bidirectional recurrent neural network with support
for long- and short-term memory. Thus, the RCNN architecture can be presented in the form shown in
Figure 5. As mentioned above, the use of complex neural networks has a significant drawback – the
time of their training and processing. In addition, data processing time is also a problem. In order to
reduce the impact of these shortcomings, it was decided to use the MapReduce technology.
3.4.</p>
    </sec>
    <sec id="sec-7">
      <title>MapReduce technology</title>
      <p>
        The MapReduce technology consists in dividing the original data set in such a way that it is
processed on separate nodes. The key operations are the use of mapping and reduction functions. The
first allows you to distribute data between the nodes on which the desired processing is carried out, and
the second instead collects data from all nodes and unifies them. It is worth noting that the MapReduce
technology defines only the specifics of the implementation of the corresponding modules within
certain frameworks. At the same time, in general, this implementation can differ significantly. For this
paper, it was decided to use MapReduce, which is offered by Hadoop [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Graphically, the proposed
solution can be presented as shown in Figure 6.
      </p>
      <p>For the above case, special attention should be paid to the distribution and combination functions.
They are necessary to implement additional parallelization in each of the nodes, using different memory
regions. To better understand the essence of this approach, you can consider basic nodes as processes,
and specified memory regions as streams.</p>
      <p>In addition to these two functions, an important feature is the sorting of information before reduction.
The current work considers data that significantly depends on the order and at the same time does not
have additional time labels, such as in time series. To avoid the problem during reduction, it was decided
to add field with the sequence number of each fragment of text/signal. It will be used for sorting.</p>
      <p>Regarding the peculiarities of the implementation of the proposed approach, it is worth noting.
MapReduce will be used independently in the preprocessing of the raw data and during the training of
neural networks. In the case of signals, it is only important to carry out the reduction in the correct order
to carry out the reduction. To process audio as text, it is necessary to take into account the importance
of forming as large a vocabulary as possible. For this purpose, it was decided to create a separate
nonrelational database with multithreading support, where the entire existing dictionary will be recorded
after basic processing. Thus, the more material will be processed, the higher the accuracy of the
formation of the corresponding frequency characteristic will be.</p>
      <p>For the RCNN architecture, the first step is the CNN layer. It iteratively adjusts the weights by
computing their partial gradients after each set of training data is propagated through the network. Thus,
parallelization during the training phase can be achieved by dividing the data into several fragments.
Each piece of data is then fed into multiple CNNs, and each CNN is trained independently in parallel.
The results are then aggregated using a reducer to produce the final results, which are then used to
update the weights for the next iteration. After the CNN layer is finished, the aggregated data is
transferred to BiLSTM. To speed up the bidirectional neural network, you can distribute the work of
two neural networks between two nodes. In this case, the reduction function will serve as an aggregation
function of the results of the two networks. Among the general advantages of the proposed approach,
one can highlight its scalability, relative cheapness, ease of use, and the possibility of execution
monitoring using the appropriate Hadoop tools (if internal monitoring is necessary, basic methods of
the Python programming language are used).</p>
      <p>Among the disadvantages, authors can single out the need to create a large amount of software code,
the stealth of processing (although with the possibility of viewing log files) and the need for long-term
configuration settings.</p>
    </sec>
    <sec id="sec-8">
      <title>4. Experimental environment</title>
      <p>In this article, the experimental environment refers to the task of multi-criteria selection to determine
the most efficient classification method and the experimental design describing the data samples and
MapReduce setup features.
4.1.</p>
    </sec>
    <sec id="sec-9">
      <title>Plan of the experiment</title>
      <p>Given the specificity of the proposed research, the method of a controlled experiment was chosen.
The base runtime has the following set of characteristics:
 CPU: Intel Core i5-1135G7;
 RAM: 16 Gb;
 VRAM: 4 Gb;
 OS: Ubuntu 21.04.</p>
      <p>The specified characteristics were almost completely duplicated on virtual nodes, on which the
process of partial calculation is planned to be carried out (RAM was reduced from 16 to 8). Their
number will be from 3 to 4 (2 in the case of parallelization of bidirectional neural networks).</p>
      <p>The datetime library for Python 3, which is accurate to the nanosecond, was chosen as the means of
calculating the execution time. So that basic calculations do not slow down the program, it was decided
to use the numpy and polars libraries. For the processing of natural languages (taking into account
lemmatization, tokenization, and other necessary functions), it was decided to choose the python
version of the nltk library. To implement neural networks, it was decided to use tensorflow using the
tools provided by the pipeline submodule.</p>
      <p>When considering audio in the form of text, as already mentioned, it was decided to use Google
Cloud Platform to avoid the influence of the environment on the speed of work. The optimization of
this tool made it possible to reduce the delay time when receiving text information decoding from 10
seconds (for processing 2 minutes of audio) to 2 seconds. The first value is obtained with the help of a
self-built speech recognition tool.</p>
      <p>As data for verification, authors will use a set of audio generated by our own hands, generated based
on text news and partially modified in various ways, described earlier.</p>
      <p>The first set of data concerns the full-scale invasion of Russia on the territory of Ukraine and consists
of both simple news and reactions to certain events by users of social networks, television experts, etc.
It was decided to dedicate the second data set to the election process in 2019, which was accompanied
by the appearance of a large amount of falsified information. Each of these samples will be split 80 to
20 into training and test subsamples, respectively.
4.2.</p>
    </sec>
    <sec id="sec-10">
      <title>The problem of multi-criteria selection</title>
      <p>To be able to compare the mentioned different types of neural networks with and without the use of
MapReduce, it is necessary to decide on the main selection criteria. Given that the task of classification
for socially acute processes is considered, the most important criteria are time-saving and accuracy of
classification.</p>
      <p>In general, the following list of factors was chosen:
 accuracy indicator;
 saving time of model training at the same capacity;
 saving time of data processing;
 saving the minimum permissible amount of data to achieve Accuracy = 90%;
 the possibility of taking into account the context.</p>
      <p>Here it is worth noting that the data reprocessing time-saving indicator is important only for
determining the autoregressive algorithm and the speed gain provided by the MapReduce technology.
Therefore, only four metrics from the above will be used to evaluate the performance of neural
networks.</p>
      <p>Model training time savings will be measured in seconds using the above library. At the same time,
authors do not limit the indicator itself.</p>
      <p>To reduce the impact of possible measurements caused by problems with the accuracy of the time
modules or the environment, it was decided to conduct 5 measurements for the time indicators and
check the accuracy of the prediction on two data samples.</p>
      <p>Classification accuracy is determined using a combination of F1-score and Precision, normalized in
the range from 0 to 1. Authors will measure accuracy for two samples and take the average value.</p>
      <p>Given that context can be taken into account to varying extents, as demonstrated by the example of
LSTM and BiLSTM architectures:
 the context is fully taken into account in both directions - 5 points;
 the context is taken into account only in one direction - 4 points;
 the context is taken into account only in the neighborhood in both directions - 3 points;
 the context is taken into account only in the neighborhood in one direction - 2 points;
 the context is not taken into account - 1 point.</p>
      <p>It should be noted that this indicator will not be used when classifying signals.</p>
      <p>To determine which of the models is the most effective according to the above criteria, it was decided
to apply the principle of linear additive convolution with weighting coefficients. To determine the
weighting coefficients, it was decided to conduct an expert evaluation among journalists and analysts
(the number of reviewers was 50 people).</p>
      <p>In the issue of classification, both signals and text, the most important is the accuracy indicator. In
second place are opportunities to take into account the context, and in third place are time indicators.
In this way, the next rules can be assigned:
 for accuracy: 16 points;
 for the possibility of taking into account the context: 10 points;
 for saving time of model training: 2 points;
 for saving the minimum amount of data to achieve the required accuracy: 2 points.
With this in mind, the following weighting factors for each criterion were obtained:
 for accuracy: 16/30 = 8/15 in the case of audio as text, 16/20 = 0.8 in the case of signal;
 for the possibility of taking into account the context: 10/30 = 5/15 in the case of audio as text;
 for saving time of model training: 2/30 = 1/15 in the case of audio as text, 2/20 = 0.1 in the case
of audio as a signal;</p>
      <p> for saving the minimum amount of data to achieve the required accuracy: 2/30 = 1/15 in the
case of audio as text, 2/20 = 0.1 in the case of audio as a signal.</p>
      <p>The next important element of the experimental environment is the determination of possible errors.
Based on the described plan, the following factors can be identified that can affect the result:
 during time-saving check: human factor and instrumental error;
 during accuracy check: data problem.</p>
      <p>To mitigate these uncertainties, as already mentioned, the indicators will be measured several times.
Having considered the main aspects of the experimental environment, authors will proceed to the
discussion of the implementation of the chosen approach.</p>
    </sec>
    <sec id="sec-11">
      <title>5. Models implementation</title>
      <p>Since the essence of this process for a signal is relatively trivial, for illustration the augmentation
process based on autoregression is considered. As already mentioned, in order to speed up this process,
it was decided to use MapReduce. When implementing the selected Hadoop-based technology, the
reducer and mapper functions are still the most important, despite the presence of additional structures
on the nodes.</p>
      <p>In order to make the algorithm language-independent and easier to understand, pseudocode will be
provided for each MapReduce function in the future. Mapper shown in Figure 7.</p>
      <p>In the case of processing textual information, one of the main stages is to conduct a content analysis,
which would allow first to form a dictionary of all important words, and then to find the BM -25
characteristic. As already mentioned, for this it was decided to use the nltk library, which contains a
large number of different corpora of words.</p>
      <p>The Porter stemmer was chosen for the stemming operation, and the WordNet lemmatizer was used
for lemmatization. The specified data structures showed a high accuracy of correct definition of the
bases and lemmas of Ukrainian words.</p>
      <p>It should be noted that to determine the accuracy, fragments of the results of news text processing
were compared with the corresponding processing carried out by linguists from the cities of Kharkiv,
Dnipro and Lviv. If dialectics are not done take into account, then the accuracy was 100%. In the case
of English, the accuracy score remained unchanged.</p>
      <p>The BM-25 components were implemented using the sklearn library. The part of the code that
performs dictionary formation for the English language is shown in Figure 8:</p>
      <p>After processing, the text data is transferred to the non-relational MongoDB database. The rejection
of the classic tabular data storage architecture was due to the following factors:
 higher speed of downloading data from the database, which reduced the time of text re-editing;
 the need to save audio files in the database.</p>
      <p>The next step in implementation is neural network programming. To avoid implementation errors,
as already mentioned, the tensorflow library was used. Although it does not provide an opportunity to
create BiLSTM and RCNN architectures directly, it allows the creation of Pipeline structures that can
contain several models and reprocessing functions. At the same time, the use of similar structures in
combination with our own aggregating structures allows us to build the specified models using
MapReduce. For example, autors will give a pseudocode for reduction, which must be used when
aggregating the results of several convolutional neural networks. It is shown in Figure 9:</p>
      <p>Separately, it is worth noting that the code fragments responsible for communication with the Google
Speech to Text API, data transfer to the database and individual elements of reprocessing, as well as
the aggregation of results, were decided to be omitted. The decision was made to simplify the
statements. Having considered the peculiarities of the implementation of the chosen approach, authors
can proceed to the analysis of the results of the conducted experiment.</p>
    </sec>
    <sec id="sec-12">
      <title>6. Experiment results</title>
      <p>Before starting to present the results, indicate the value of the possibility of taking into account the
context for each of the above models:</p>
      <p> CNN – 3 points, the context is taken into account only in the neighborhood in both directions
using the convolution function;</p>
      <p> RNN – 2 points, the context is taken into account only in the neighborhood in one direction
due to the presence of short-term memory;
 LSTM – 4 points, the context is taken into account only in one direction;
 BiLSTM – 5 points, the context is fully taken into account in both directions due to the
bidirectional nature of the network;</p>
      <p> RCNN – 5 points, context is fully taken into account in both directions using convolution
functions and a bidirectional network with long-term memory.</p>
      <p>Start the overview of experiment results with the data reprocessing time savings indicator for the
signal use case (augmented using autoregressive models).</p>
      <p>The results are shown in Table 1 (all savings values are calculated relative to the slowest algorithm
– the sequential version of integrated moving average vector autoregression).</p>
      <p>The following notations were introduced to simplify the statements:
 VAR – simple vector autoregression;



</p>
      <p>VARS – seasonal vector autoregression;
VARL – vector autoregression of the distributed lag;
VARMA – vector autoregression of the moving average;</p>
      <p>VARIMA – vector autoregression of the integrated moving average.</p>
      <p>For VAR – 0.058 s, for VARS – 0.046 s, for VARL – 0.035 s, for VARMA – 0.017 s, for VARIMA
– 0 s. As you can see, on average, the moving average and integrated moving average algorithms are
significantly slower. This is explained by the fact that they take into account exogenous variables in
full and at the same time perform noise adjustment. Since in our case, the accuracy of augmentation is
not the most important indicator, given the results, it was decided to use classical vector autoregression.</p>
      <p>The gain in speed for MapReduce versions is ~ 2.9 for each of the models. At the same time, if you
improve the configuration for MapReduce by increasing the number of nodes to 4, then the profit will
be ~ 3.74. In the case of audio-to-text analysis, there is only a difference between the parallelized
version and the sequential version. In the case of three MapReduce nodes, the acceleration was ~ 3.1,
and in the case of four ~ 4.3. The number of nodes is less than the acceleration, due to additional
optimization of requests to the Google Speech-to-Text API. The results of the training time-saving
measurements and start with the analysis of the audio as a signal show in Table 2. This time, the
application of parallelization will be available only for BiLSTM and RCNN architectures, to determine
the overall level of speedup. This time, the model built on the RCNN architecture is the slowest.</p>
      <sec id="sec-12-1">
        <title>BiLSTM</title>
      </sec>
      <sec id="sec-12-2">
        <title>RCNN CNN 50 49</title>
        <p>52
54
50
RNN
48
44
47
45
46</p>
      </sec>
      <sec id="sec-12-3">
        <title>Sequential</title>
        <p>LSTM
29
30
28
33
30
15
14
17
16
18
0
0
0
0
0</p>
      </sec>
      <sec id="sec-12-4">
        <title>MapReduce</title>
      </sec>
      <sec id="sec-12-5">
        <title>BiLSTM</title>
        <p>30
29
31
28
30</p>
      </sec>
      <sec id="sec-12-6">
        <title>RCNN</title>
        <p>25
24
27
24
26</p>
        <p>Authors have the following average values of indicators: CNN: 51 s; RNN: 46 s; LSTM: 30 s;
BiLSTM: 16 s; RCNN: 0 s. As a result, the more complex the architecture is, the slower the training
of the corresponding model is. In this case, the speedup obtained with MapReduce for BiLSTM was 2
(explained by the ease of parallelization and the limitation to 2 nodes); for RCNN ~ 3.52 (for four nodes
the result increased to ~ 4.68). When carrying out similar measurements for training selected neural
networks with textual data, the following average time saving results were obtained: CNN: 45 s; RNN:
41 s; LSTM: 24 s; BiLSTM: 12 s; RCNN: 0 s; BiLSTM based MapReduce: 24 s; RCNN based
MapReduct: 22 s.</p>
        <p>The average time saving is less, which is explained by the specifics of natural language processing.
This time, the acceleration obtained using MapReduce for BiLSTM was 2 (the situation is similar to
the previous one); for RCNN ~ 3.2 (for 4 nodes the result increased to ~ 4.41). It should be noted that
the use of MapReduce did not affect the classification accuracy for both data sets, so the corresponding
calculations were omitted. In general, the results are shown in Table 3 for audio as a signal:
RNN</p>
        <p>Considering the obtained result, RCNN guarantees the highest classification accuracy (although the
difference with BiLSTM is not significant). In the case of considering audio as text, the situation almost
does not change, given the results shown in Table 4.</p>
        <p>The final metric is the training sample size needed to achieve at least 90% accuracy. To do this,
several iterations were carried out with a gradual increase in the number of records from 5000 to 10,000
(in the case of audio signals, the vast majority was the result of augmentation).</p>
        <p>It is found that the specified accuracy is achieved with 7000 records for CNN; 7500 entries for RNN;
6600 records for LSTM; 6100 entries for BiLSTM; and 5800 entries for RCNN. Thus, the minimum
allowable data saving is 1700 for RCNN, 1400 for BiLSTM, 900 for LSTM, 500 for CNN, and 0 for
RNN. Now the obtained metric values can be systematized for the case of audio processing as a
signal (see Table 5). All unnormalized values were normalized and rounded to the nearest hundredth.
Based on the results, the Pareto-optimal alternatives will be determined (see Table 6).</p>
        <p>Based on the obtained results, it is possible to calculate the value of linear additive convolution with
weighting coefficients. For CNN – 0.849, for LSTM – 0.856, for BiLSTM – 0.881, and for RCNN –
0.876. This allows us to note that the most effective model for detecting the fact of falsification for
audio as a signal is a bidirectional recurrent neural network with support for long- and short-term
memory. However, the difference between BiLSTM and RCNN is not very noticeable and can be
considered an error. In addition, the significant speed gain for BiLSTM is partially neutralized by the
use of MapReduce technology. Let's proceed to the systematization of the results obtained during the
classification of audio as text information. The corresponding normalized values are given in Table 7.
Based on the results, the Pareto-optimal alternatives will be determined (see Table 8).</p>
        <p>Based on the obtained results, it is possible to calculate the value of linear additive convolution with
weighting coefficients. For CNN – 0.771, for LSTM – 0.833, for BiLSTM – 0.918, and for RCNN –
0.917. As in the previous case, the most efficient model is BiLSTM, but the gain in speed is reduced
due to parallelization. Considering the above, it can be noted that the most effective models are BiLSTM
and RCNN, and the results of the MapReduce application prove the feasibility of its use in the
classification of fake audio recordings of various kinds.</p>
      </sec>
      <sec id="sec-12-7">
        <title>RCNN</title>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>7. Conclusion</title>
      <p>The current work aimed to develop an effective model for determining the fact of falsification of
sound data, using MapReduce technology. For this purpose, an analysis of the features of falsification
of audio information, both in the form of a signal and in textual representation, was carried out. In
addition, the analysis of modern scientific publications devoted to the chosen topic and several expert
surveys allowed us to form a set of algorithms for creating our model for identifying fake audio. The
first stage of this model includes:</p>
      <p> in the case of audio as text: data processing using Google Speech to Text and subsequent
conversion of the text into a numerical representation based on: frequency-emotional characteristics
(found using the BM-25 algorithm) of the message itself and the latest verified news, degree of
conversion reliability, weight message;</p>
      <p> in the case of audio as a signal: cleaning the signal from noise and non-vocalized areas
followed by augmentation using vector autoregression parallelized using MapReduce (the results of
the experiment allowed us to justify the choice of classic vector autoregression).</p>
      <p>The next stage is the application of a neural network. Based on the review of the analyzed studies,
it was decided to focus on recurrent and convolutional neural networks. In particular, it is:
 classical convolutional neural network;
 classical recurrent neural network;
 recurrent neural network with long-term memory;
 bidirectional recurrent neural network with long-term memory;
 hybrid neural network combining several convolutional networks with a bidirectional
recurrent network with long-term memory.</p>
      <p>To overcome the problem of time-related learning of the model, it was decided to use MapReduce
technology. To determine the most effective neural network and the expediency of using the proposed
parallelization method, a set of criteria was formed that allowed using the principle of linear additive
convolution with weighting coefficients. Based on these criteria, specified modifications and
implementation of selected models using Python 3 libraries, a series of experiments was conducted with
data on the election process in Ukraine in 2019 and the full-scale invasion of the Russian Federation on
the territory of Ukraine.</p>
      <p>In the course of the experiments, it was found that the bidirectional recurrent neural network with
long-term memory is the most effective, although it loses in speed to less complex models. At the same
time, the difference in efficiency between it and the hybrid neural network is insignificant. It is found
that the gain in reprocessing time saving when using MapReduce technology can reach 4.3 in the case
of text and 4 in the case of signal. At the same time, the gain in neural network training time savings
can reach 4.71 in the case of text and 4.68 in the case of signal, bridging the gap in performance between
BiLSTM and RCNN. Therefore, authors can claim that the use of a built model based on BiLSTM (or
RCNN) is highly effective for determining the fact of audio forgery both in the form of text and the
form of a signal. At the same time, the implementation of the process of data processing and training
of neural networks using MapReduce is appropriate. Thus, the goal of this study has been fulfilled.</p>
    </sec>
    <sec id="sec-14">
      <title>8. Acknowledgements</title>
      <p>The authors would like to thank the Armed Forces of Ukraine for the opportunity to write a valid
work during the full-scale invasion of the Russian Federation on the territory of Ukraine.</p>
    </sec>
    <sec id="sec-15">
      <title>9. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Anders</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>"Fake News Detection"</article-title>
          ,
          <source>European Data Protection Supervisor</source>
          , available at: https://edps.europa.eu/press-publications/publications/techsonar/fake-news-detection_
          <source>en (last accessed 14.10</source>
          .
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aljrees</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yadav</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>K. U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verma</surname>
            ,
            <given-names>G. K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2023</year>
          ),
          <article-title>"Real-Time Advanced Computational Intelligence for Deep Fake Video Detection"</article-title>
          , Applied Science, No.
          <volume>13</volume>
          (
          <issue>5</issue>
          ), Article 3095. DOI:
          <volume>10</volume>
          .3390/app13053095.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Batailler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brannon</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teas</surname>
            ,
            <given-names>P. E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gawronski</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2023</year>
          ),
          <article-title>“A Signal Detection Approach to Understanding the Identification of Fake News”</article-title>
          ,
          <source>Perspectives on Psychological Science</source>
          , No.
          <volume>17</volume>
          (
          <issue>1</issue>
          ), P.
          <fpage>78</fpage>
          -
          <lpage>98</lpage>
          . DOI:
          <volume>10</volume>
          .1177/1745691620986135.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Reis</surname>
            ,
            <given-names>J. C. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Correia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murai</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veloso</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Benevenuto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2019</year>
          ), “
          <article-title>Supervised Learning for Fake News Detection”</article-title>
          ,
          <source>IEEE Intelligent Systems</source>
          , No.
          <volume>34</volume>
          (
          <issue>2</issue>
          ), P.
          <fpage>76</fpage>
          -
          <lpage>81</lpage>
          . DOI:
          <volume>10</volume>
          .1109/MIS.
          <year>2019</year>
          .
          <volume>2899143</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Giandomenico</surname>
            ,
            <given-names>D. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ishizaka</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Nunan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2021</year>
          ), “
          <article-title>Fake news, social media and marketing: A systematic review”</article-title>
          ,
          <source>Journal of Business Research</source>
          , Vol.
          <volume>124</volume>
          , P.
          <fpage>329</fpage>
          -
          <lpage>341</lpage>
          . DOI:
          <volume>10</volume>
          .1016/j.jbusres.
          <year>2020</year>
          .
          <volume>11</volume>
          .037.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp; Cheng, N. (
          <year>2023</year>
          ), “
          <article-title>Sustainable Development of Information Dissemination: A Review of Current Fake News Detection Research</article-title>
          and Practice”,
          <string-name>
            <surname>Systems</surname>
          </string-name>
          , No.
          <volume>11</volume>
          (
          <issue>9</issue>
          ), Article 458. DOI:
          <volume>10</volume>
          .3390/systems11090458.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Rocha</surname>
            ,
            <given-names>Y. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Moura</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desiderio</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Oliveira</surname>
            ,
            <given-names>C. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lourenço</surname>
            ,
            <given-names>F. D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>de Figueiredo Nicolete</surname>
            ,
            <given-names>L. D.</given-names>
          </string-name>
          (
          <year>2023</year>
          ), “
          <article-title>The impact of fake news on social media and its influence on health during the COVID-19 pandemic: a systematic review”</article-title>
          ,
          <source>Journal of Public Health</source>
          , Vol.
          <volume>31</volume>
          , P.
          <fpage>1007</fpage>
          -
          <lpage>1016</lpage>
          . DOI:
          <volume>10</volume>
          .1007/s10389-021-01658-z.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Bondielli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Oglio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcelloni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passaro</surname>
            <given-names>L. C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sabbatini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2023</year>
          ),
          <article-title>" MULTI-Fake-DetectiVE at EVALITA 2023: Overview of the MULTImodal Fake News Detection and VErification Task"</article-title>
          .
          <source>Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</source>
          ,
          <source>(EVALITA</source>
          <year>2023</year>
          )
          <article-title>: 8th Internaional Conference</article-title>
          , Parma,
          <volume>7</volume>
          <fpage>September</fpage>
          - 8
          <source>September</source>
          <year>2023</year>
          : CEUR Workshop Proceedings. No.
          <volume>3373</volume>
          , available at: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3473</volume>
          /paper32.pdf (last accessed:
          <volume>14</volume>
          .
          <fpage>10</fpage>
          .
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Sardar</surname>
            ,
            <given-names>T. H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ansari</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2020</year>
          ), “
          <article-title>An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm”</article-title>
          ,
          <source>Journal of The Institution of Engineers (India): Series B</source>
          , Vol.
          <volume>101</volume>
          , P.
          <fpage>641</fpage>
          -
          <lpage>650</lpage>
          . DOI:
          <volume>10</volume>
          .1007/s40031-020-
          <fpage>00485</fpage>
          -2z.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Duzhin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2022</year>
          ), “
          <article-title>Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets”</article-title>
          ,
          <source>Big Data and Cognitive Computing</source>
          , No.
          <volume>6</volume>
          (
          <issue>3</issue>
          ), Article 74. DOI:
          <volume>10</volume>
          .3390/bdcc6030074.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Choudhary</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2021</year>
          ), “
          <article-title>Linguistic feature based learning model for fake news detection and classification”, Expert Systems with Applications</article-title>
          , No.
          <volume>169</volume>
          , Article 114171. DOI:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2020</year>
          .
          <volume>114171</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Alonso</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilares</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gómez-Rodríguez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Vilares</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2021</year>
          ), “
          <article-title>Sentiment Analysis for Fake News Detection”</article-title>
          , Electronics, No.
          <volume>10</volume>
          (
          <issue>11</issue>
          ),
          <article-title>Article 1348</article-title>
          . DOI:
          <volume>10</volume>
          .3390/electronics10111348.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Tolosana</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vera-Rodriguez</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fierrez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morales</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ortega-Garcia</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2020</year>
          ),
          <article-title>“Deepfakes and beyond: A Survey of face manipulation and fake detection”</article-title>
          ,
          <source>Information Fusion</source>
          , Vol.
          <volume>64</volume>
          , P.
          <fpage>131</fpage>
          -
          <lpage>148</lpage>
          . DOI:
          <volume>10</volume>
          .1016/j.inffus.
          <year>2020</year>
          .
          <volume>06</volume>
          .014.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Afanasieva</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golian</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khovrat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Onyshchenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2023</year>
          ),
          <article-title>"Application of Neural Networks to Identify of Fake News"</article-title>
          .
          <source>Computational Linguistics and Intelligent Systems (COLINS</source>
          <year>2023</year>
          )
          <article-title>:</article-title>
          7th International Conference, Kharkiv,
          <volume>20</volume>
          <fpage>April</fpage>
          - 21
          <source>April</source>
          <year>2023</year>
          :
          <article-title>CEUR workshop proceedings</article-title>
          , No. 3396, P.
          <fpage>346</fpage>
          -
          <lpage>358</lpage>
          , available at: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3396</volume>
          /paper28.pdf (last accessed:
          <volume>14</volume>
          .
          <fpage>10</fpage>
          .
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Bhatia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2020</year>
          ), “
          <article-title>Using transfer learning, spectrogram audio classification, and MIT app inventor to facilitate machine learning understanding”</article-title>
          , Massachusetts Institute of Technology, available at: https://dspace.mit.edu/handle/1721.1/127379 (last accessed:
          <volume>14</volume>
          .
          <fpage>10</fpage>
          .
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Breuer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eilat</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Weinsberg</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          (
          <year>2020</year>
          ),
          <article-title>" Friend or Faux: Graph-Based Early Detection of Fake Accounts on Social Networks"</article-title>
          . Web Conference (WWW '20): International Conference, Taipei,
          <volume>20</volume>
          <fpage>April</fpage>
          - 24
          <source>April</source>
          <year>2023</year>
          :
          <article-title>Association for Computing Machinery</article-title>
          , P.
          <fpage>1287</fpage>
          -
          <lpage>1297</lpage>
          . DOI:
          <volume>10</volume>
          .1145/3366423.3380204.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2020</year>
          ),
          <article-title>“A Discrete Hidden Markov Model for SMS Spam Detection”</article-title>
          , Applied Science, No.
          <volume>10</volume>
          (
          <issue>14</issue>
          ),
          <article-title>Article 5011</article-title>
          . DOI:
          <volume>10</volume>
          .3390/app10145011.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Najar</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zamzami</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Bouguila</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2019</year>
          ),
          <article-title>"Fake News Detection Using Bayesian Inference"</article-title>
          .
          <source>Information Reuse and Integration for Data Science (IRI</source>
          <year>2019</year>
          )
          <article-title>:</article-title>
          20th IEEE International Conference, Los Angeles, 30 July - 1
          <source>August</source>
          <year>2019</year>
          : IEEE, P.
          <fpage>389</fpage>
          -
          <lpage>394</lpage>
          . DOI:
          <volume>10</volume>
          .1109/IRI.
          <year>2019</year>
          .
          <volume>00066</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Montserrat</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horváth</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yarlagadda</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Delp</surname>
            ,
            <given-names>E. J.</given-names>
          </string-name>
          (
          <year>2020</year>
          ),
          <article-title>" Generative Autoregressive Ensembles for Satellite Imagery Manipulation Detection"</article-title>
          .
          <source>Workshop on Information Forensics and Security (WIFS</source>
          <year>2020</year>
          )
          <article-title>:</article-title>
          12th IEEE International Workshop, New York,
          <volume>6</volume>
          <fpage>December</fpage>
          - 11
          <source>December</source>
          <year>2020</year>
          : IEEE, P. 1-
          <fpage>6</fpage>
          . DOI:
          <volume>10</volume>
          .1109/WIFS49906.
          <year>2020</year>
          .
          <volume>9360909</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Ning</surname>
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>You</surname>
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2019</year>
          ), “
          <article-title>Optimization under uncertainty in the era of big data and deep learning: When machine learning meets mathematical programming”</article-title>
          .
          <source>Computers &amp; Chemical Engineering</source>
          , Vol.
          <volume>125</volume>
          , Р.
          <fpage>434</fpage>
          -
          <lpage>448</lpage>
          . DOI:
          <volume>10</volume>
          .1016/j.compchemeng.
          <year>2019</year>
          .
          <volume>03</volume>
          .034.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Khovrat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobziev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nazarov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yakovlev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2022</year>
          ),
          <article-title>"Parallelization of the VAR Algorithm Family to Increase the Efficiency of Forecasting Market Indicators During Social Disaster"</article-title>
          .
          <source>Information Technology and Implementation (IT&amp;I</source>
          <year>2022</year>
          )
          <article-title>: 9th Internaional Conference</article-title>
          , Kyiv,
          <volume>30</volume>
          <fpage>November</fpage>
          - 2
          <source>December</source>
          <year>2022</year>
          : CEUR Workshop Proceedings. No. 3347, P.
          <fpage>222</fpage>
          -
          <lpage>233</lpage>
          , available at: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3347</volume>
          /Paper_19.
          <string-name>
            <surname>pdf</surname>
          </string-name>
          (last accessed:
          <volume>14</volume>
          .
          <fpage>10</fpage>
          .
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <article-title>"Long Short-Term Memory Networks (LSTM)"</article-title>
          , Data Base Camp, available at: https://databasecamp.de/en/ml/lstms (last
          <source>accessed 14.10</source>
          .
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Zvornicanin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <article-title>"Differences Between Bidirectional and Unidirectional LSTM"</article-title>
          , Baeldung, available at: https://www.baeldung.
          <article-title>com/cs/bidirectional-vs-unidirectional-lstm (last accessed 14</article-title>
          .10.
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Ashley</surname>
          </string-name>
          ,
          <article-title>"An Overview on Convolutional Neural Networks"</article-title>
          , Medium, available at: https://medium.com
          <article-title>/swlh/an-overview-on-convolutional-neural-networks-ea48e76fb186 (last accessed 14</article-title>
          .10.
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2021</year>
          ), “
          <article-title>Detecting Anomalous Kicks in Taekwondo With Spatial and Temporal Features”</article-title>
          ,
          <source>IEEE Access</source>
          , Vol.
          <volume>9</volume>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          164928-
          <fpage>164934</fpage>
          . DOI:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3134967</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Al-Khasawneh</surname>
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uddin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>S. A. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khasawneh</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abualigah</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Mahmoud</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2022</year>
          ), “
          <article-title>An improved chaotic image encryption algorithm using Hadoop-based MapReduce framework for massive remote sensed images in parallel IoT applications”</article-title>
          .
          <source>Cluster Computing</source>
          , Vol.
          <volume>25</volume>
          , Р.
          <fpage>999</fpage>
          -
          <lpage>1013</lpage>
          . DOI:
          <volume>10</volume>
          .1007/s10586-021-03466-2.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>