<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Hybrid Neural Network and Bayesian Classification Framework for Multi-Class Disinformation Detection in Socially Oriented Systems⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Artem Khovrat</string-name>
          <email>artem.khovrat@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymyr Kobziev</string-name>
          <email>volodymyr.kobziev@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Strelchenko</string-name>
          <email>andrii.strelchenko@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>14, Nauky, Ave., Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>The widespread dissemination of manipulated information across interactive digital platforms represents a significant societal challenge demanding sophisticated computational interventions. This study introduces an innovative two-tier machine learning architecture that integrates hybrid Recurrent Convolutional Neural Networks with probabilistic Bayesian classification methods for enhanced detection and categorization of false information. The research establishes a comprehensive taxonomic framework distinguishing five distinct categories of misleading content. The classification system utilizes a sevendimensional feature vector incorporating emotional valence analysis, rhetorical pattern density, negative linguistic structure frequency, contextual sentiment indexing, deception probability coefficients, content influence metrics, and emotional intensity measurements. Empirical evaluation was conducted using balanced datasets comprising Ukrainian conflict-related content (20,000 instances) and comparative English-language electoral information from recent American political events. The proposed two-stage methodology demonstrated substantial performance enhancements, achieving classification accuracy of 95.3% versus 65.4% for single-layer RCNN implementations - a 46% relative improvement. The hybrid system exhibited exceptional data efficiency representing a tenfold reduction compared to conventional approaches needing 5,000 samples. Computational analysis indicated modest processing overhead of 5.0%, while comprehensive multi-metric assessment revealed 52.5% overall system improvement. Distributed processing implementation through MapReduce architecture ensures computational scalability for largescale deployment scenarios. The research contributes a practical framework for automated content verification systems with particular applicability during periods of information warfare and social instability.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;computational linguistic</kwd>
        <kwd>content verification</kwd>
        <kwd>distributed computing</kwd>
        <kwd>fake news1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern digital technologies have reached unprecedented levels of sophistication in content
manipulation capabilities, prompting legislative bodies worldwide to address the challenges of
identifying inauthentic information across social media platforms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The severity of this
challenge varies considerably across different media formats. While video manipulation techniques
remain relatively detectable due to technical limitations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], textual content and image falsification
have achieved concerning levels of refinement, driving substantial research efforts and practical
detection solutions [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Under normal circumstances, such deceptive content primarily generates
interpersonal disputes within social groups, with particularly pronounced effects in digital
communities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>However, during</p>
      <p>
        periods of geopolitical instability, the stakes escalate
dramatically as information processing becomes clouded by heightened emotional responses,
compromising analytical reasoning capabilities. The integration of manipulative content into
mainstream media channels can accelerate societal transformations triggered by crisis events while
magnifying their destructive potential [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The consequences span financial, sociocultural, and strategic domains, fundamentally
distorting public discourse. The extensive disinformation campaigns accompanying the
RussianUkrainian conflict exemplify such scenarios, where false narratives systematically obscured war
crimes and eroded confidence in Ukrainian defense institutions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Implementation strategies
necessarily adapt to the characteristics of target data types. This investigation concentrates
exclusively on textual news content, reflecting current limitations in generating convincingly
authentic video falsifications that can deceive human perception.
      </p>
      <p>
        Three primary methodological approaches dominate textual data classification [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
1. Probabilistic frameworks encompassing naive Bayesian classifiers, Markov chain models,
and Bayesian network architectures.
2. Neural network implementations, including recurrent and convolutional architectures,
transformer models, and alternative deep learning paradigms.
3. Polynomial modeling approaches incorporating linear additive convolution with weighted
coefficients and threshold parameters .
      </p>
      <p>
        Research conducted by Spanish investigators into inauthentic textual content [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] revealed
that machine learning algorithms demand extensive datasets to achieve superior classification
performance (exceeding 95% accuracy) while exhibiting heightened sensitivity to outlier instances.
Alternative detection methodologies merit consideration, particularly graph-based approaches
extensively investigated by Harvard researchers [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for identifying fraudulent account profiles.
These techniques deliver rapid results with minimal data requirements, though their adaptation to
textual analysis necessitates substantial preprocessing that eliminates speed advantages.
      </p>
      <p>Spam filtration research provides relevant insights for false content detection.
ChineseAmerican research teams demonstrated Markov chain effectiveness [12], though domain-specific
requirements render such approaches computationally intensive, as corroborated by
Montrealbased Canadian researchers [13].</p>
      <p>Autoregressive techniques offer alternative solutions for detecting artificially generated
content, provided authentic samples from target individuals are available. However, these models
prove ineffective against contextual manipulations and are therefore excluded from subsequent
analysis.</p>
      <p>
        Previous investigations focusing on binary classification between authentic and fabricated
content have explored probabilistic models alongside diverse neural network architectures [
        <xref ref-type="bibr" rid="ref5">5, 14</xref>
        ].
Results identified hybrid networks combining recurrent and convolutional components
specifically RCNN architectures - as optimal solutions balancing accuracy and computational
efficiency. A significant challenge emerged in assessing the societal impact of inauthentic content.
Certain materials exhibit obvious humorous characteristics readily identifiable by human readers,
presenting minimal societal risk. Conversely, content designed to undermine confidence in critical
legislative decisions poses substantial public threats.
      </p>
      <p>Literature analysis reveals several critical research gaps in falsified information detection.
Current studies predominantly emphasize binary classification without considering societal threat
gradations. Satirical content and deliberate disinformation require distinct detection strategies, yet
comprehensive taxonomies accounting for impact scale and potential harm remain
underdeveloped. While hybrid neural networks like RCNN demonstrate exceptional performance,
their capabilities could be substantially enhanced through multilayer architectural designs. Existing
research has not explored integrating RCNN with complementary classification methodologies to
optimize overall system performance. Additionally, most current solutions demand significant
computational resources and extensive training datasets, constraining practical deployment
scenarios.</p>
      <p>This research aims to develop a dual-layer fake information classification model integrating
naive Bayesian classification with hybrid recurrent-convolutional neural networks. The following
objectives guide this investigation:
•
•
•
•
•
analysis.</p>
      <p>Establish markers characteristic of fabricated data to facilitate detection processes.
Conduct expert assessments to define primary fake information categories.</p>
      <p>Develop classification models for segregating falsified data groups using naive Bayesian
approaches.</p>
      <p>Execute experimental validation comparing the proposed dual-layer model against
singlelayer RCNN implementations.</p>
      <p>Analyze experimental outcomes and formulate conclusions through multi-criteria decision</p>
    </sec>
    <sec id="sec-2">
      <title>2. Indicators of disinformation</title>
      <p>Constructing effective analytical models requires careful formulation of feature vectors as critical
determinants of classification performance. Through comprehensive linguistic analysis and
empirical observation, a systematic categorization of discriminative characteristics inherent to
fabricated information was identified and organized:
• Excessive utilization of interrogative constructions designed to manipulate sociolinguistic
contexts.
• Systematic elimination of negative constructions combined with hyperbolic term
substitution.
• Inappropriate incorporation of appellative and stimulating linguistic structures, particularly
evident in contexts attempting to simulate legitimate news discourse.
• Overuse of pronouns frequently correlates with contextual manipulation attempts.
• Presence of systematic grammatical and stylistic anomalies, especially within purported
quotations from authoritative sources.</p>
      <p>This expanded feature set facilitates development of robust, multidimensional classification
models capable of identifying fabricated information across various modalities with enhanced
accuracy and recall coefficients. The comprehensive approach addresses the complexity of modern
disinformation campaigns while maintaining computational efficiency through strategic feature
selection and optimization.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Classes of disinformation</title>
      <p>The initial phase in addressing multi-classification challenges involves establishing fundamental
disinformation categories through rigorous methodological frameworks. To determine this
classificatory scheme, an expert panel comprising 100 data analysts from various European and
North American countries was assembled. Subsequently, an open survey utilizing standardized
assessment protocols was conducted to identify the most vulnerable types of information
falsification. Aggregated responses from 300 participants (n=300, 95% confidence interval, margin
of error ±5.66%) were instrumental in formulating the defined groups:
•
•
•
•
•</p>
      <p>Overt satirical material (featuring explicit comedic indicators and recognizable structural
patterns that signal non-serious intent).</p>
      <p>Subtle satirical content (requiring contextual interpretation and cultural knowledge for
proper identification of humorous intent).</p>
      <p>Targeted personal disinformation (focused misinformation campaigns directed at specific
individuals or narrow demographic groups).</p>
      <p>Regional-scale false narratives (misleading information designed to influence broader
communities, multiple regions, or large population segments).</p>
      <p>Global-impact disinformation (systematic false information campaigns with international
reach and potential for widespread societal disruption).</p>
      <p>The categorization structure demonstrates a hierarchical framework with escalating scope and
potential impact, facilitating both quantitative and qualitative analysis of disinformation patterns.
This taxonomic approach enables more nuanced investigation of information manipulation
strategies while providing a standardized foundation for comparative analysis across different
threat scenarios and deployment contexts.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Target features</title>
      <p>Following the establishment of fabricated information characteristics, the methodology proceeds to
develop a feature set that serves as input variables for the models. The primary metric "Emotional
Characteristic" is derived through content analysis principles [15], implementing the following
algorithmic sequence:



</p>
      <p>Segmentation of textual content into sentence units and tokenization of lexical elements
with exclusion of non-semantic constructions (e.g., "however," "this," "or").</p>
      <p>Application of lemmatization and stemming operations to extract morphological roots from
the vocabulary set.</p>
      <p>Computation and normalization of frequency-emotional indicators at the sentence level.
Implementation of sentiment analysis methodology using the NLTK module in Python3 for
determining lexical frequency distributions and emotional valence metrics.</p>
      <p>Additionally, six auxiliary quantitative indicators were incorporated into the analytical
framework:
•
•
•
•
•
•</p>
      <p>Rhetorical Density Coefficient (RDC): Defined as the ratio of rhetorical constructions to
total sentence count - RDC = (RCC/TS), where: RCC = Rhetorical Construction Count, TS =
Total Sentences.</p>
      <p>Negative Construction Frequency (NCF): Quantifies the density of negative linguistic
structures - NCF = (NCC/TS), where: NCC = Negative Construction Count, TS = Total
Sentences.</p>
      <p>Contextual Emotional Index (CEI): Derived from sentiment analysis of temporally relevant
high-traffic content; analyzes emotional valence patterns among the 50 highest-rated news
articles; provides temporal calibration for classification algorithms.</p>
      <p>Suspicion Coefficient (SC): Calculated through lexical comparison patterns with predefined
deception indicators; utilizes a validated corpus of terms associated with fabricated
information; implements normalized frequency analysis for inter-textual comparison.
Message Impact Factor (MIF): Hierarchical classification of content significance; weighting
system evaluation based on content domain and coverage; includes multidimensional
impact assessment.</p>
      <p>Sentiment Magnitude Vector (SMV): Aggregated measure of emotional content intensity;
normalized representation of overall message valence; includes both polarity and
magnitude components.</p>
      <p>This integrated approach facilitates comprehensive feature extraction and analysis, ensuring
robust classification of potentially fabricated information across various contextual domains.
Parallelized implementation ensures computational efficiency while preserving analytical precision
across diverse linguistic and cultural contexts.</p>
    </sec>
    <sec id="sec-5">
      <title>5. First layer for classification model</title>
      <p>
        In traditional Convolutional Neural Network (CNN) architectures, filter operations facilitate
incorporation of local spatial dependencies; however, the distinctive nature of the proposed
indicators requires understanding of extended temporal sequences without introducing future state
dependencies [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This presents limitations as important contextual information may exist beyond
the CNN's receptive field boundaries. To address this architectural constraint, a hybrid approach
combining Recurrent Neural Network (RNN) and CNN methodologies was implemented
(illustrated in Figure 1 in simplified form).
      </p>
      <p>The proposed RCNN architecture strategically combines the strengths of convolutional and
recurrent neural networks through a multi-stage processing pipeline. This integration addresses
the limitations of each approach when applied individually to textual disinformation detection. As
a critical foundation of this approach, the initial stage utilizes a convolutional layer configuration
for feature extraction. Textual input undergoes tokenization and embedding transformation,
resulting in a matrix representation where each row corresponds to a token and each column
represents an embedding dimension. Several architectural enhancements were implemented to
optimize model performance:


</p>
      <p>Implementation of dilated convolutions to expand the effective receptive field; utilization of
skip connections to preserve detailed feature information; integration of attention
mechanisms to capture long-term dependencies.</p>
      <p>Implementation of gated memory units for information control; utilization of adaptive
forget gates for memory retention optimization; integration of memory-efficient
backpropagation methods.</p>
      <p>Implementation of residual connections to facilitate gradient propagation; utilization of
layer normalization for stable training dynamics; integration of gradient clipping to
prevent numerical instability.</p>
      <p>The training protocol for this integrated architecture includes curriculum learning for
improved convergence, beginning with simpler examples and gradually introducing complex cases.
Dynamic batch sizing optimizes memory utilization, starting with larger batches and progressively
reducing size for enhanced convergence precision. Early stopping with patience factor p = 5
monitors validation loss to prevent overfitting, while learning rate scheduling implements initial
rate 0.001 with exponential decay factor 0.95 per epoch. Regularization strategies include dropout
layers (rate = 0.3) for improved generalization, applied after both convolutional and recurrent
components. L2 regularization (λ = 0.01) prevents overfitting, particularly for dense layers, while
feature-oriented regularization enables robust feature learning through normalization at multiple
network stages. Recurrent dropout (rate = 0.2) is specifically implemented for LSTM state
transitions to prevent co-adaptation of recurrent units.</p>
      <p>Several methods enhance computational efficiency:



</p>
      <p>Model quantization reduces memory footprint by converting 32-bit floating-point
operations to 16-bit.</p>
      <p>Sparse tensor operations are utilized particularly for high-dimensional embedding layers.
Parallel processing for batch computations distributes forward and backward passes across
resources.</p>
      <p>Gradient accumulation enables efficient training with limited memory resources.</p>
      <p>This enhanced architectural configuration demonstrates superior performance characteristics
while maintaining computational efficiency. The integration of bidirectional recurrent components
with convolutional layers enables effective capture of both spatial and temporal dependencies in
the feature space, achieving validation accuracy of 94.3% on benchmark datasets. The hybrid
architecture successfully addresses disinformation detection challenges through complementary
processing pathways: CNN components effectively extract local linguistic patterns and stylistic
markers, while LSTM components capture long-term dependencies and contextual inconsistencies
that frequently characterize fabricated information.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Second layer for classification model</title>
      <p>The Naive Bayesian Classification (NBC) methodology operates on fundamental principles of
Bayesian probability theory, computing class membership probabilities while maintaining feature
independence assumptions. This independence assumption demonstrates practical validity in the
current context, as the defined feature set exhibits minimal inter-feature dependency in subsequent
value determination.</p>
      <p>Bayes' theorem fundamentally describes the probability of event occurrence based on prior
knowledge of conditions related to that event. In this context, it calculates the probability of
information belonging to a specific class, considering several key components: the probability of
observing specific features when information belongs to that class, the general probability of class
occurrence in the dataset, and the overall probability of observing these specific features among all
possible classes. This relationship is expressed mathematically as:</p>
      <p>Here
represents the posterior probability of class
given features
to
;
is the likelihood of observing these features in class
;
is the prior
probability of class ; is the evidence, or overall probability of the feature set.</p>
      <p>Under the naive independence assumption, the likelihood term can be decomposed as:
Classes
correspond to the five disinformation categories defined above:
(1)
(2)</p>
      <p>It is worth noting that by construction these classes are independent. In addition, the total
probability that a message will belong to one of these classes is equal to 1. This allows us to use
Bayes' theorem. Features F jcorrespond to the seven indicators established above:</p>
      <p>To prevent numerical overflow from multiplying small probabilities, computations are
implemented in logarithmic space with feature weighting based on information gain metrics:
(3)
(4)</p>
      <p>Implementation follows a comprehensive three-phase approach. During the training phase,
conditional probability distributions</p>
      <p>are estimated for each class and feature using kernel
density estimation, particularly suitable for continuous features. Class prior probabilities are
computed using frequency distributions in the training dataset with Laplace smoothing to address
class imbalance.</p>
      <p>During the inference phase, feature values are extracted from input instances and normalized
according to procedures specified above. For each class, posterior probability is computed based on
Bayesian principles with the naive independence assumption. The fundamental relationship is
expressed as:</p>
      <p>Here represents the normalized information gain weight for feature . everal additional
optimization mechanisms enhance classifier performance, including feature normalization and
bandwidth parameter optimization for kernel density estimation. These methods collectively
enable robust classification through systematic assessment of class membership probabilities,
particularly effective for multiple independent feature sets.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Distributed computing</title>
      <p>MapReduce will be applied autonomously during input data preprocessing and throughout neural
network training processes. For textual data preprocessing, forming a maximally complete
vocabulary represents a critical requirement. A specialized non-relational database with
multithreaded access support was created, where after basic processing (elimination of service words,
lemmatization, stemming) the entire available lexicon will be stored. Consequently, increasing the
volume of processed material leads to improved accuracy in forming corresponding frequency
characteristics.</p>
      <p>For RCNN architecture, the initial phase involves the CNN convolutional layer. Weight
parameters are iteratively adjusted through computation of their partial gradients after each
training set passes through the network. Therefore, parallelization during the training process can
be implemented by segmenting data into multiple parts. Each segment is transmitted to multiple
CNNs that train independently. Subsequently, results are aggregated through the reducer to obtain
final data used for updating weight coefficients in the next iteration. After completing
convolutional layer operations, aggregated data proceeds to BiLSTM. To accelerate the
bidirectional neural network, the work of two neural networks can be distributed between separate
nodes. In such cases, the reduction function essentially performs the role of aggregating results
from both networks.</p>
      <p>The Naive Bayesian classifier proves particularly suitable for parallel processing due to its
probabilistic nature and independence of computations for different features.</p>
      <p>During the training stage, data is distributed among nodes for independent computation of
statistics for each feature. Each node computes local frequencies and probabilities for its data
portion. The reduction function aggregates these statistics to obtain global probability distributions
and prior probabilities .</p>
      <p>During the classification phase, posterior probability computations for different classes can be
performed in parallel on separate nodes. Each node receives the feature vector and computes
membership probability for its assigned class subset. Final classification is determined through
comparison of results from all nodes.</p>
      <p>Additionally, probability computations for different features can be parallelized, since features
are independent under the naive assumption. This allows distributing
among nodes and combining results through multiplication in logarithmic space.
computations</p>
    </sec>
    <sec id="sec-8">
      <title>8. Experimental environment</title>
      <p>Contemporary neural network research demands controlled experimental protocols requiring
precise implementation structures and standardized execution environments.</p>
      <p>Implementation precision relies heavily on precise temporal measurements achieved through
Python 3's datetime library with nanosecond resolution. Computational optimization leverages
numpy and polars libraries, while linguistic processing employs nltk functionality. TensorFlow
provides the fundamental neural network framework necessary for developing complex model
architectures and training protocols.</p>
      <p>Validation rigor derives from two distinct datasets focusing on contemporary sociopolitical
events. Primary data analysis encompasses the Russian-Ukrainian war, consisting of 20,000
balanced records derived from 5,000 initial trilingual posts standardized through Ukrainian
linguistic transformation. Additional analysis utilizes a 2020 US election dataset maintaining
equivalent English-language volume and facilitating cross-linguistic validation. Both datasets
employ error mitigation protocols within an 80/20 training/testing distribution framework.</p>
      <p>The experimental corpora comprise two primary sources: (1) the Ukrainian conflict-related
dataset collected from verified open-access Telegram and Twitter channels (January 2022 – March
2025), standardized to Ukrainian language through semi-automatic translation and lemmatization;
and (2) an English-language dataset derived from the 2020 U.S. presidential election discourse
(August 2020 – December 2020). Both datasets were compiled exclusively from publicly available,
non-personal content and licensed under the Creative Commons Attribution 4.0 International (CC
BY 4.0) framework to ensure replicability and ethical compliance. The resulting corpora include
only textual news statements; personally identifiable information and metadata were removed
before processing.</p>
      <p>The annotated corpus used for training and evaluation was evenly distributed across the five
identified categories of disinformation, ensuring class balance and comparability. Each class—overt
satire, subtle satire, targeted personal disinformation, regional narratives, and global-impact
disinformation—comprised approximately one-fifth of the total dataset (around four thousand
samples per class in the Ukrainian corpus). The same proportional structure was preserved in the
English-language corpus. For model development, data were randomly divided into 80 percent for
training and 20 percent for testing, with stratification applied to maintain class balance across both
subsets.</p>
      <p>Methodological reliability stems from comprehensive evaluation protocols incorporating
expertise from 50 data analysis specialists across various countries. Performance assessment
utilizes complex weighting systems prioritizing accuracy (16 points) through balanced Precision
(0.80) and Recall (0.20) metrics. Processing efficiency and data volume optimization contribute
equally (2 points each) to the evaluation matrix. Statistical validity emerges through linear additive
convolution with weighting coefficients, ensuring comprehensive model assessment while
maintaining focus on classification accuracy. This approach demonstrates particular effectiveness
in handling high-dimensional feature spaces and complex linguistic patterns across different
languages, specifically minimizing false-negative classifications in socially sensitive contexts.
Architectural flexibility facilitates seamless computational node integration, ensuring scalable
performance optimization without structural modifications. Such adaptability proves invaluable
when processing heterogeneous data streams while maintaining stable classification accuracy
across diverse linguistic and contextual domains.</p>
      <p>Experimental uncertainty quantification requires systematic identification and mitigation of
potential error sources within the measurement framework. Analysis of the experimental protocol
reveals two primary uncertainty categories: temporal measurement errors and accuracy estimation
deviations. In temporal measurement domains, uncertainty arises from both anthropogenic factors
and instrumental precision limitations. Human factors introduce variability through operational
inconsistencies, while instrumental error manifests through systematic and random deviations in
measurement equipment performance. These temporal uncertainties directly impact computational
efficiency assessment and system response evaluation. Accuracy estimation uncertainty primarily
stems from data quality variations and integrity considerations. These uncertainties may manifest
through dataset incompleteness, annotation inconsistencies, or classification ambiguities,
potentially affecting performance metric reliability.</p>
      <p>To address these systematic uncertainties, a robust measurement protocol was established
implementing ten-fold iterations (n=10) for each performance indicator. This repeated
measurement approach enables statistical validation of results, minimizing the impact of random
fluctuations and systematic biases. Implementation of multiple measurement cycles facilitates
computation of standard deviations and confidence intervals, providing more comprehensive
understanding of model performance stability.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Results of the experiment</title>
      <p>Performance accuracy evaluation for each architectural configuration involved conducting ten
independent iterations to ensure statistical reliability. Figure 2 presents detailed accuracy results
for each architecture across the first dataset.</p>
      <p>Mean accuracy values across all iterations were 65.4% (σ = 0.55) for standalone RCNN and 95.3%
(σ = 0.35) for RCNN+NBC. Result stability across both datasets indicates architectural model
robustness to linguistic and contextual variations between different disinformation domains.
Notably, the dual-layer RCNN+NBC approach consistently outperformed baseline RCNN
implementation across all iterations and datasets.</p>
      <p>A detailed per-class analysis confirmed consistently high performance across all five
disinformation categories. Precision and recall values for each class remained in the range of
roughly 0.93–0.97, resulting in macro- and micro-averaged F1-scores close to 0.95. The most
frequent classification overlaps occurred between overt and subtle satire, reflecting their semantic
proximity and shared stylistic cues; however, such misclassifications accounted for less than five
percent of all cases. The results therefore demonstrate stable and balanced detection quality, with
no dominant bias toward any specific category.</p>
      <p>Figure 3 below presents accuracy results for the second dataset. Processing time was evaluated
through multiple measurement iterations, recording average inference time required for classifying
single samples. Hardware configuration remained consistent across all architectural variants to
ensure comparable results. Table 1 presents processing time measurements across five independent
iterations.</p>
      <p>Figure 3: Accuracy results for each architecture on US election 2020 dataset [created by the
authors].</p>
      <p>Processing time results demonstrate moderate differences between architectural approaches.
Baseline RCNN implementation achieved the lowest average processing time (125 ms). RCNN+NBC
configuration required 5.0% additional time (131.2 ms) compared to baseline performance.</p>
      <p>For data efficiency assessment, each architecture was evaluated using progressively larger
training dataset sizes until achieving accuracy exceeding 80%. This threshold was established
through expert evaluation as the minimum acceptable performance level for practical deployment.
Results reveal substantial differences in data efficiency between architectural variants. RCNN+NBC
configuration demonstrates exceptional data efficiency, requiring only 500 samples to achieve
acceptable performance - a 90% reduction compared to baseline implementation, which required
5,000 samples.</p>
      <p>To facilitate comprehensive comparison, individual performance metrics were normalized
relative to baseline RCNN implementation and aggregated, as shown in Table 2.</p>
      <p>
        Time saving was calculated as the inverse of normalized processing time, with baseline RCNN
implementation assigned value 1.00. Accuracy values were normalized to [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] scale, and volume
saving was computed as proportional reduction in minimally required samples relative to baseline.
Relative importance of each metric was determined through expert assessment with accuracy
weighted 0.8 and both time and volume savings weighted 0.1 each. Applying these weights through
linear additive convolution yields efficiency coefficients of 0.62 for standalone RCNN and 0.945 for
RCNN+NBC.
      </p>
      <p>Experimental results demonstrate that the RCNN+Naive Bayes approach achieved an average
52.5% efficiency improvement compared to direct RCNN method application. This enhancement
encompasses all evaluated metrics with particularly significant improvements in data efficiency
and classification accuracy. RCNN+NBC architecture proves optimal configuration, achieving the
highest overall efficiency coefficient (0.945) through balanced performance across all metrics. This
architecture combines robust RCNN feature extraction capabilities with probabilistic Bayesian
classification framework, resulting in exceptional data efficiency while maintaining high
classification accuracy.
10. Conclusion
The objective of this research was to develop an effective dual-layer model for detecting textual
information falsification based on hybrid recurrent-convolutional neural network approaches
combined with naive Bayesian classification. The investigation conducted comprehensive analysis
of textual information falsification characteristics within socially oriented systems characterized by
significant user loads. Based on expert evaluation, a classification structure encompassing five
categories of fake information was established, ranging from satirical content to globally harmful
news. Additionally, a set of seven discriminative features for identifying fabricated information
was developed, including emotional characteristics, rhetorical density coefficients, negative
construction frequency, contextual emotional indices, suspicion coefficients, message impact
factors, and sentiment magnitude vectors. These features form the foundation for classification
through naive Bayesian classifier, constituting the first layer of the proposed model.</p>
      <p>To enhance computational efficiency, parallelization of training and data processing procedures
was implemented through MapReduce technology on the Hadoop platform. This enabled
distribution of CNN component training among multiple nodes with subsequent result aggregation
through reducers. Experimental verification was conducted on two datasets: Russian-Ukrainian
war news (20,000 records) and 2020 US election coverage (equivalent volume). Multi-criteria
evaluation employed weighting coefficients: accuracy (0.8), time savings (0.1), and data efficiency
(0.1).</p>
      <p>Experimental results demonstrate substantial advantages of the proposed dual-layer approach.
The RCNN+NBC model achieved 95.3% accuracy compared to 65.4% for baseline RCNN,
representing a 46% relative performance enhancement. Particularly significant is the data efficiency
improvement - the dual-layer model requires only 500 training samples to achieve acceptable
accuracy versus 5,000 for baseline architecture, constituting a 90% data reduction.</p>
      <p>Processing time increased modestly (5.0%), offset by substantial classification quality
improvements. The overall efficiency coefficient for the dual-layer model reached 0.945 versus 0.62
for baseline implementation, demonstrating 52.5% enhancement.</p>
      <p>Application of dual-layer classification methodology successfully extends baseline falsification
detection capabilities to include impact scale assessment and fabrication intentionality analysis.
Results confirm the feasibility of implementing the proposed approach for reducing disinformation
impact in socially oriented systems, particularly during crisis periods. Future research directions
include extending the methodology to multimodal content (video, images), investigating transfer
learning possibilities between different disinformation domains, and optimizing architecture for
real-time operation in high-load systems.</p>
      <p>To promote transparency and reproducibility, the authors intend to release a de-identified
subset of the multilingual dataset together with the source code implementing the RCNN + Naive
Bayes training and evaluation procedures. The materials will be made publicly available after the
completion and publication of other research papers that also rely on these corpora, ensuring that
data disclosure does not compromise the integrity of concurrent investigations. Prior to release, all
entries will undergo additional anonymization to remove user identifiers, timestamps, and message
metadata while preserving textual authenticity for linguistic analysis.
11. Limitations and practical outlook
Despite high experimental accuracy, several limitations remain:



</p>
      <p>Training data are restricted to Ukrainian and English; direct transfer to morphologically
distant languages (e.g., Arabic, Chinese) may reduce performance without adaptation or
multilingual embeddings.</p>
      <p>Textual style and context differ across media (Telegram vs Reddit vs X), possibly affecting
feature distributions and classifier calibration.</p>
      <p>Although only public data were used, any large-scale deployment must ensure continued
anonymization and bias audits.</p>
      <p>Real-time monitoring requires stream-processing adaptation (e.g., Kafka + TensorFlow
Serving) and automatic model updates via concept-drift detection.
Future work will address these challenges through multilingual fine-tuning, cross-platform
evaluation, and embedding the hybrid classifier into practical content-moderation pipelines
for governmental and media organizations.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgments</title>
      <p>The authors would like to thank the Armed Forces of Ukraine for the opportunity to write a valid
work during the full-scale invasion of the Russian Federation on the territory of Ukraine. Also, the
authors wish to extend their gratitude to Kharkiv National University of Radio Electronics for
providing licences for additional software to prepare algorithms and the paper.</p>
    </sec>
    <sec id="sec-11">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly Edu and submodule of Microsoft
365 in order to check grammar and spelling. After using these services, the authors reviewed and
edited the content as needed and take full responsibility for the publication’s content.
[12] Yuslee, N. S., Abdullah, N. A. S, (2021), "Fake News Detection using Naive Bayes". 11th
International Conference on System Engineering and Technology, Shah Alam, Malaysia, 6
November 2021: IEEE Explore, P. 112–117. DOI: 10.1109/ICSET53708.2021.9612540.
[13] Breuer, A., Eilat, R., Weinsberg, U. (2023), "Friend or Faux: Graph-Based Early Detection of
Fake Accounts on Social Networks", Web Conference, 20–24 April 2023, Taipei, P. 1287–1297.</p>
      <p>DOI: 10.1145/3366423.3380204.
[14] Yakovlev, S., Khovrat, A., Kobziev, V., Uzlov, D. (2024), "Decision Support Algorithm in the
Development of Information Sensitive Socially Oriented Systems". Workshop of
ITprofessionals on Artificial Intelligence, Cambridge, 25 September – 27 September 2024: CEUR
workshop proceedings, P. 315–326, available at: https://ceur-ws.org/Vol-3777/paper20.pdf (last
accessed: 27.06.2025).
[15] Choudhary, A., Arora, A. (2021), "Linguistic feature based learning model for fake news
detection and classification", Expert Systems with Applications, Vol. 169, Article 114171. DOI:
10.1016/j.eswa.2020.114171.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Aïmeur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            ,
            <surname>Amri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Bassard</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          (
          <year>2023</year>
          ), “
          <article-title>Fake news, disinformation and misinformation in social media: a review”, Social Network Analysis and Mining</article-title>
          , No.
          <volume>13</volume>
          (
          <issue>30</issue>
          ). DOI:
          <volume>10</volume>
          .1007/s13278-023-0102.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Anders</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>"Fake News Detection</article-title>
          .
          <article-title>European Data Protection Supervisor"</article-title>
          , available at: https://edps.europa.eu/press-publications/publications/techsonar/fake-news-detection_
          <source>en (last accessed 05.08</source>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Vardhan</surname>
            ,
            <given-names>K. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Josephine</surname>
            ,
            <given-names>B. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rama Rao</surname>
            <given-names>K. V. S. N.</given-names>
          </string-name>
          , (
          <year>2022</year>
          ),
          <article-title>"Fake News Detection in Social Media Using Supervised Learning Techniques"</article-title>
          .
          <source>2022 International Conference on Sustainable Computing and Data Communication Systems</source>
          , Erode, India,
          <volume>7</volume>
          <fpage>April</fpage>
          - 9
          <source>April</source>
          <year>2022</year>
          : IEEE Explore, P.
          <fpage>695</fpage>
          -
          <lpage>698</lpage>
          . DOI: ICSCDS53736.
          <year>2022</year>
          .
          <volume>9760961</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
          </string-name>
          , L., Cheng, N. (
          <year>2023</year>
          ),
          <article-title>"Sustainable Development of Information Dissemination: A Review of Current Fake News Detection Research and Practice"</article-title>
          ,
          <source>Systems</source>
          , No.
          <volume>11</volume>
          (
          <issue>9</issue>
          ), Article 458. DOI:
          <volume>10</volume>
          .3390/systems11090458.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Afanasieva</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golian</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khovrat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Onyshchenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2023</year>
          ),
          <article-title>"Application of Neural Networks to Identify of Fake News"</article-title>
          .
          <source>Computational Linguistics and Intelligent Systems (COLINS</source>
          <year>2023</year>
          )
          <article-title>:</article-title>
          7th International Conference, Kharkiv,
          <volume>20</volume>
          <fpage>April</fpage>
          - 21
          <source>April</source>
          <year>2023</year>
          :
          <article-title>CEUR workshop proceedings</article-title>
          , No. 3396, P.
          <fpage>346</fpage>
          -
          <lpage>358</lpage>
          , available at: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3396</volume>
          /paper28.pdf (last accessed:
          <volume>05</volume>
          .
          <fpage>08</fpage>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Rocha</surname>
            ,
            <given-names>Y. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Moura</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desiderio</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Oliveira</surname>
            ,
            <given-names>C. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lourenço</surname>
          </string-name>
          , F. D.,
          <string-name>
            <surname>de Figueiredo Nicolete</surname>
            ,
            <given-names>L. D.</given-names>
          </string-name>
          (
          <year>2023</year>
          ),
          <article-title>"The impact of fake news on social media and its influence on health during the COVID-19 pandemic: a systematic review"</article-title>
          ,
          <source>Journal of Public Health</source>
          , Vol.
          <volume>31</volume>
          , P.
          <fpage>1007</fpage>
          -
          <lpage>1016</lpage>
          . DOI:
          <volume>10</volume>
          .1007/s10389-021-01658-z.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Karalis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2024</year>
          ),
          <article-title>"Fake leads, defamation and destabilization: how online disinformation continues to impact Russia's invasion of Ukraine"</article-title>
          ,
          <source>Intelligence and National Security</source>
          , Vol.
          <volume>39</volume>
          (
          <issue>3</issue>
          ). P.
          <volume>512</volume>
          -
          <fpage>524</fpage>
          . DOI:
          <volume>10</volume>
          .1080/02684527.
          <year>2024</year>
          .
          <volume>2329418</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Alonso</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilares</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gómez-Rodríguez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilares</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2021</year>
          ),
          <article-title>"Sentiment Analysis for Fake News Detection"</article-title>
          , Electronics, No.
          <volume>10</volume>
          (
          <issue>11</issue>
          ),
          <article-title>Article 1348</article-title>
          . DOI:
          <volume>10</volume>
          .3390/electronics10111348.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Tolosana</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vera-Rodriguez</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fierrez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morales</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ortega-Garcia</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2020</year>
          ),
          <article-title>"Deepfakes and beyond: A Survey of face manipulation and fake detection"</article-title>
          ,
          <source>Information Fusion</source>
          , Vol.
          <volume>64</volume>
          , P.
          <fpage>131</fpage>
          -
          <lpage>148</lpage>
          . DOI:
          <volume>10</volume>
          .1016/j.inffus.
          <year>2020</year>
          .
          <volume>06</volume>
          .014.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Bhatia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2020</year>
          ),
          <article-title>"Using transfer learning, spectrogram audio classification, and MIT app inventor to facilitate machine learning understanding"</article-title>
          , Massachusetts Institute of Technology, available at: https://dspace.mit.edu/handle/1721.1/127379 (last accessed 05.08.
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X. A.</given-names>
          </string-name>
          (
          <year>2020</year>
          ),
          <article-title>"Discrete Hidden Markov Model for SMS Spam Detection"</article-title>
          ,
          <source>Applied Science</source>
          , Vol.
          <volume>10</volume>
          (
          <issue>14</issue>
          ),
          <article-title>Article 5011</article-title>
          . DOI:
          <volume>10</volume>
          .3390/app10145011.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>